4 The record articles

Understanding CMS Data Validation

Posted: February 15th, 2023

Authors: Tom C. 

Many facilities that are subject to the requirements of a Title V Operating Permit must operate and maintain continuous monitoring systems (CMS) to demonstrate ongoing compliance with various state and Federal requirements in the permit. Managing the quality and evaluating the validity of data collected by a CMS is one of the most important aspects of running a successful CMS program.  A myriad of factors can influence how CMS data validity is assessed, including but not limited to: the parameter being monitored, the type of CMS, the applicable regulations (a source that is subject to two or more regulations may have different monitoring requirements for each), CMS maintenance, and CMS malfunction.  The goal of this article is to summarize some of the concepts surrounding data validity and substitution and show how they come into play in a hypothetical scenario.

What is a CMS?

CMSs are systems that are used to collect data that are used for demonstrating compliance with an applicable regulation on a continuous basis. CMS is a comprehensive term that includes several different types of monitoring systems.  Three of the most common CMS are continuous emissions monitoring systems (CEMS), continuous opacity monitoring systems (COMS), and continuous parameter monitoring systems (CPMS).

What is data validity? What is invalid data, and what causes data to be invalid?

Valid data is data that is collected by the CMS that can be used to demonstrate compliance with an emissions or operating limit.  For data to be valid, it must be quality-assured using the appropriate quality assurance procedures determined by the type of CMS and the applicable regulations, and it must also be representative of actual conditions in the environment being monitored (typically either in an exhaust stack or within a process or control device).

Invalid data is data that is collected by the CMS, but is not considered to be quality-assured or representative of what’s actually occurring in the environment being monitored.  A number of things can cause invalid data, but typical causes are CMS maintenance, CMS malfunctions, and issues with quality assurance/quality control on the system. Invalid data that is caused by the CMS failing quality assurance or quality control checks is considered to be out-of-control (OOC).

Sometimes, data can be invalidated based on how the compliance average is handled.  For example, if the compliance average is 1 hour, some types of CMS will have a valid hour as long as there is at least one valid minute for each 15-minute period in that hour.  For other types of CMS, at least 75% of the minutes in that hour must be valid to constitute a valid hour of data.  For longer averaging periods, the validity of the averaging period can be configured in several ways.  For example, for a 3-hour period, the validity of the period can be configured to be valid when only one hour is valid (one-of-three), two hours are valid (two-of-three), or all three hours are valid (three-of-three). A CMS monitoring one pollutant may have several data validity requirements – for example, the state and Federal requirements for data validity might be different – so it is important to understand the differences across data validity requirements and how they affect data used for compliance determinations.

What is done with invalid data?

Typically, if facilities are subject to continuous monitoring requirements, the length of the periods when invalid data were collected (and a compliance demonstration is not available) must be reported to the governing regulatory agency, but not necessarily the values. A period when a compliance demonstration is not available is considered downtime. Not all invalid data causes downtime; the threshold of invalid data that causes downtime can depend on the CMS and the applicable regulations. Depending on the type of CMS, there are downtime thresholds that may trigger the need to submit a more detailed report or even fines if the downtime is too high.

In some instances when a CEMS is being used, invalid data will be substituted to allow for estimation of emissions.  The facility’s data acquisition and handling system (DAHS) should be configured to handle data invalidation and substitution based on the applicable regulation and the facility’s needs.

What is data substitution, and when does it come into play?

Data substitution is the practice of substituting invalid data based on other valid data that was collected.  This is typically done to estimate mass emissions (e.g., tons of SO2 in a 12-month period).  Data substitution can be as simple as taking the average of the period before and after the invalid period, or it can be as complicated as using statistical analysis to determine confidence levels and using values that align with that, such as the data substitution requirements laid out in 40 CFR Part 75 (Part 75).  Part 75 uses tiered data substitution; the more downtime a CMS has, the higher the substituted value used to estimate emissions.  The correct data substitution procedures are based on the applicable requirements for each CMS. Some regulations or permits require Part 75 data substitution, some require more general substitution, and some don’t allow for substitution.

The hypothetical scenario:

A facility must continuously monitor the stack from their process with NOX and SO2 CEMS.  The NOX has an emission rate limit of 50 pounds per hour (i.e., lbs NOX /hr) on a 1-hour average.  The SO2 has a mass emissions limit of 50 tons on a 12-month rolling sum basis.  It is important to distinguish the difference between an emissions rate and a mass emissions limit since it can determine if missing data substitution is required.  Considering the scenario above, the NOX emissions rate in terms lbs/hr is the average rate of emissions that occurred during that hour, not the mass of NOX emitted during the hour.  If the source operated at an emission rate of 50 lbs NOX /hr and only operated for 30 minutes in that hour, then the mass emissions would only be 25 lbs of NOX.  Therefore, data substitution is not typically applied when complying with an average emissions rate that occurred during the hour.

On the other hand, SO2 has a mass emissions limit of 50 tons on a 12-month rolling sum basis.  Since the emissions limit is in terms of a sum, missing data needs to be considered at the hourly level (i.e., you do not get to sum zeros when the source is operating and the CEMS are not functioning).  The facility uses hour before/hour after substitution procedures for SO2.  The facility experiences an issue in the sampling system that prevents sampling of stack effluent for two hours.  How should the facility assess its downtime and data substitution for this period?

First, the period when the sampling system was not operating must be invalidated.  The facility is allowed to use substituted data for SO2 for those two invalid hours.  The facility uses the average of the SO2 emission rate for the hour before the event and the hour after the event to report SO2 emissions during the missing two hours.  For NOX, the facility does not use data substitution procedures since the emissions limit is an emissions rate (i.e., lbs NOX/hr) versus a mass emissions rate (i.e. lbs of NOX emitted IN the hour).

In both cases, the two hours in which the sampling system was having issues are considered CMS downtime and the duration of the event will be recorded to be submitted to the governing regulatory agency.  However, substituted data would only be utilized for the SO2 mass sum limitation and not for the average NOX emissions rate limitation.

What now?

Every facility that must maintain and operate a CMS will be subject to rules that determine data validity and specify whether a substitution methodology is allowed or required. The regulatory agency may also have a policy that determines when enforcement action is taken based on the amount of invalid data reported by a facility. It is important to accurately characterize your CMS data and the amount of invalid data on required reports, but it is not always straightforward. Please reach out to me at (610) 933-5246 or at tcunningham@all4inc.com for more information regarding this subject or any other general CMS questions you may have.


    Sign up to receive 4 THE RECORD articles here. You'll get timely articles on current environmental, health, and safety regulatory topics as well as updates on webinars and training events.
    First Name: *
    Last Name: *
    Location: *
    Email: *

    Skip to content