ato logo
Search Suggestion:

How we measure tax gaps

Last updated 29 October 2023

What methods we use to estimate tax gaps.

About the methods

There are typically 2 broad methods to estimate tax gaps – top-down or bottom-up methods (shown in Figure 4).

  • Top-down methods use externally provided aggregated data sources to estimate the size of the tax base, from which we estimate the theoretical tax liability. The difference between the theoretical tax liability and the amount we receive is the estimated tax gap. A top-down approach is typically used for indirect taxes.
  • Bottom-up methods involve a detailed examination of data sources, such as tax returns, audit results (including random enquiry programs), risk registers or third-party data-matching information. We use this information to determine the extent of non-compliance across the whole population, from which we estimate the tax gap. A bottom-up approach is typically used for direct taxes. There are 3 types as shown in Figure 5 above and described further below.

Figure 4: Our 2 methods to estimate tax gaps

Figure 4: This image provides a visual overview of two general approaches applied to estimate tax gaps, top-down and bottom-up.
Top-own approaches are based on utilising external aggregated data to estimate the theoretical tax liability. This approach is typically suited to administrative and indirect taxes.
Bottom-up approaches are based on utilising internal administrative data to estimate the theoretical tax liability. This approach is typically suited to direct taxes.

Figure 5 below illustrates the various methods we used to estimate each gap we have published.

Figure 5: Method for each gap estimate

Figure 5: This image provides a visual overview of the four main methodological approaches that we use to estimate gaps, and places each of the published gaps under one of the four main methodological approaches. The gaps listed under the top–down approach are: fuel excise, PAYG withholding, goods and services tax, superannuation guarantee and luxury car tax. The gaps listed under the bottom–up model based approach are: large corporate groups, large super funds, petroleum resource rent tax, tobacco, fringe benefits tax and product stewardship for oil fuel tax credits. Fuel tax credits and small super funds are both gaps that use a hybrid approach. The gaps listed under the bottom–up random enquiry program approach are: Individuals not in business, small business, fuel tax credits and small super funds. The gaps listed under the final method bottom–up statistical approach are high wealth private groups and wine equalisation tax.

Choosing the methodology

We choose the methodology that provides the most reliable estimate for each gap we measure. In doing this, we carefully consider the characteristics of each gap, including:

  • the design of the tax or program
  • the characteristics of the population
  • availability and quality of data.

Assessing these factors helps us decide which method is the most appropriate to use. For example, in order to use a top-down method we generally require external data. If we don't have a reliable external data source available, we know we'll need to use a bottom-up method to generate a reliable result.

We assess our methodologies for reliability, and where possible test them against alternatives to ensure that we are using the most appropriate methodology. We consult with our Engagement, advice and assurance on the options available to us. We also look to other jurisdictions to see what methodologies they use for similar gaps.

We continually work to update and improve our gap estimates. Part of this involves assessing the methodology used, to ensure it's still the most appropriate option. This means we can remain confident that our gap estimates are reliable and credible.

Gap approaches in detail

This section provides a more detailed explanation of the top-down and bottom-up methods we use to measure tax gap estimates

Top-down methods

A top-down method essentially looks at a system and breaks it down to understand each of its constituent parts and how these work individually. Top-down methods use external information about the system for which we are constructing an estimate.

This method does not always provide information on what drives the tax gap, but rather tells us that a gap exists. An example of this is the goods and services tax (GST) gap, which uses information collected through the Australian National Accounts data set. This data is collated by the Australian Bureau of Statistics (ABS) and, therefore, sits outside data collected by us – for example, audit data.

Bottom-up methods

We have used 3 broad types of bottom-up methods:

Random enquiry programs

A random enquiry program (REP) is a process for selecting tax returns for evaluation. As the name suggests, the tax returns are randomly selected. This ensures that all have the same likelihood of being chosen.

This is unlike operational audit selection processes, which focus on taxpayers considered to have a higher risk of non-compliance with a potentially large amount of tax at risk. Operational audit data is biased towards this high risk, high consequence segment of taxpayers.

In contrast, random selection avoids any systematic selection of segments of the population. It is designed to provide an unbiased representation of taxpayer information.

Statistical-based approaches

Statistical-based approaches use a set of mathematical models to estimate an outcome where it would be impractical to obtain a data set that covers 100% of the population being estimated.

The 2 types of statistical-based approaches used within the tax gap program to estimate various tax gaps are:

Regression analysis

Regression analysis is a standard statistical technique for estimating the relationships between one variable and a series of other variables. The regression can be used to identify the probability or the magnitude of the tax gap using all available taxpayer records and compliance results.

To produce reliable and credible results when using regression analysis, corrections need to be made for selection bias. This bias exists because taxpayers we undertake our compliance activities on are higher risk taxpayers. If we don't adjust for this bias, our estimates are likely to be wrong and conclusions misleading. We adjust for selection bias using either:

  • propensity score matching
  • Heckman’s correction.

The benefit of regression analysis is that it is useful in identifying characteristics that help predict whether or not a taxpayer is non-compliant, as well as characteristics that help predict the degree of non-compliance. Based on these characteristics, or drivers, the size of the tax gap can be estimated for the taxpayers that are modelled to be non-compliant.

Extreme value theory

Extreme value theory is appropriate when the data is characterised by extreme outlier observations – for example, the data follows the 80/20 rule. That is, a small number of the data points (20%) make up a majority of the total value (80%).

This type of data is commonly seen in finance and science. We also see it in the data related to amendments to tax returns, both positive and negative. This can be from taxpayer adjustments or as a result of our compliance activities.

When we look at extreme values, we look to the relationship between the size of the extreme values. Their rank is estimated and applied to the population to inform the final tax gap estimate.

Model-based approaches

Where a random enquiry is not suitable, and available data does not match the assumptions required for a statistical approach, we use model-based approaches.

Model-based approaches identify the key themes, factors or channels that contribute to the gap, which are then used to inform the final estimate. Like all our estimates, they draw on all available data including expert judgment, management information and system data to inform the final estimate.

These approaches can also be individually referred to as:

  • micro-analytical simulation
  • illustrative
  • channel analysis.

The aspect they have in common is a disaggregation, the analysis of known information, then an aggregation to a final estimate.