6020 Assessing the Reliability of Data
May-2024

Overview

To obtain sufficient and appropriate audit evidence, engagement teams often use data to perform audit procedures and to support audit findings and conclusions. When using data as inputs to audit procedures or to support audit findings and conclusions, engagement teams must obtain sufficient and appropriate evidence that the data is reliable, meaning that it is accurate and complete. This manual section focuses on assessing the reliability of data.

OAG Policy

When using data as input to perform audit procedures or to support audit findings and conclusions, the engagement teams shall evaluate whether the data is reliable for its intended purpose, including obtaining audit evidence about its completeness and accuracy. [May-2024]

OAG Guidance

What CSAE 3001 says about data

CSAE 3001 requires the consideration of the relevance and reliability of the information to be used as evidence and requires that sufficient and appropriate evidence be obtained in forming the audit conclusion. OAG Audit 1051 Sufficient appropriate audit evidence provides guidance on sufficiency and appropriateness of audit evidence.

Why it is important

The nature and complexity of information systems in which data is recorded, stored, transformed, and extracted do not guarantee data reliability because errors can still be introduced by people or processes. The practitioner must evaluate whether the data is sufficiently reliable for their purposes. Unreliable data may lead to unreliable findings and conclusions.

Definitions

What data is

Data is a digital representation of information. Data is a collection of discrete or continuous values describing quality, quantity, facts, statistics, or other parameters of information. Data can exist in multiple forms, including source data and system-generated outputs.

Source data

Source data, for purposes of this guidance, is data that has been obtained by the practitioner in the form that it is stored in rather than in the form of how it may be presented, used, or reported by end users or user interfaces. Source data includes raw data. A primary example is one or more data tables from databases. Source data is typically taken from the data’s primary storage location whether it be a dataset, database, data warehouse, data mart, a spreadsheet, or any other data repository. Source data may be obtained from data held by the audited entity or obtained from third parties.

System-generated outputs

System-generated outputs, for purposes of this guidance, are data that information systems have produced by compiling, aggregating, summarizing, filtering, or otherwise transforming source data (Exhibit 1). Examples include

  • standard outputs from off-the-shelf packages (outputs from software purchased from a third-party vendor)
  • outputs routinely generated by the entity’s programmed or configured systems
  • outputs generated through end-user computing tools, such as Microsoft Excel, Microsoft Access, or similar products
  • outputs generated by third-party information systems

System-generated outputs can be in the form of a report, a spreadsheet (like Excel), or a text file (like a comma-separated value file).

Exhibit 1—Relationship between source data and system-generated outputs

Distinction between source data and system-generated outputs

How data can be used a in direct engagement

Data can be used in multiple ways, including, but not limited to, the following:

  • in risk assessment procedures
  • to identify the population of items from which a sample will be selected for further testing
  • as an input to perform analytical procedures
  • as contextual information in the assurance report
  • as evidence to support audit findings and conclusions in the assurance report

Collecting information

To assess the risk of data not being reliable, engagement teams should have a good understanding of how the data is collected, the systems it is extracted from, and the relevant controls over data.

This will help engagement teams to understand the data, its nature, and the internal controls over data that are key inputs in evaluating the risk of data being unreliable and determining the best assessment strategy.
There are many ways of assessing reliability.

The following are examples of questions that engagement teams could ask to assess data and to understand the processes to generate it. The responses to these questions can help to determine risk areas and therefore anticipate issues regarding the reliability of data.

Examples of questions to consider

  • How is your data maintained (database systems, Microsoft Excel or Word files, paper)?
  • Is your data generated using an automated process, or is it entered manually?
  • At what level (transaction, individual, or program) is the information collected, or has it been aggregated?
  • How would you obtain assurance on the accuracy and completeness of your data?
  • Have your data sources been otherwise assessed for measures of data quality?
  • How would you prevent the improper access to your data?
  • Do you have controls on the entry, modification, and deletion of your computerized data?
  • For what purposes is your data used (for example, tracking, reporting)?
  • Is there an oversight function for the data sources in your organization that ensures that proper data management practices are maintained?
  • What impact or role do your data sources have with regard to influencing legislation, policy, or programs?
  • Do your data sources capture information that is likely to be considered sensitive or controversial?
  • What is your data’s classification level (unclassified, Protected A, and so on)?
  • With regard to your data, what documentation can you provide and how is it maintained (data dictionary, details on data controls, and so on)?

There are many reports or documents that engagement teams may request and review to help in acquiring a good understanding of data.

Examples of documents to consider

  • documentation on relevant information systems and processes
  • documentation on data quality procedures
  • relevant user manuals
  • relevant internal audit reports

Understanding data relevance and reliability

The relevance of data refers to the extent to which evidence has a logical relationship with, and importance to, the issue being addressed.

The reliability of data means that data is complete and accurate as follows:

  • Completeness of the data refers to the extent to which all transactions that occurred are inputs into the system, are accepted for processing, are processes once and only once by the system, and are properly included in the outputs.
  • Accuracy of the data refers to the extent that recorded data reflects the actual underlying information (for example, amounts, dates, or other facts are consistent with the original inputs entered at the source). There are other components of accuracy to be considered when the data are system generated outputs (for example, accuracy of a financial report may include the accuracy of the calculations—that is, mathematical accuracy).

Accuracy and completeness need to be assessed only if data is deemed to be relevant by engagement teams. This is why accuracy and completeness are the main attributes to consider for assessment.

Data reliability versus data integrity

Data reliability and data integrity are 2 different concepts. Data integrity refers to the accuracy with which data has been extracted from the information system (source), transferred to engagement teams, and transformed into an appropriate format for engagement teams to use. This distinction is important as data can have integrity but not reliability (for example insufficient evidence has been obtained over the accuracy and completeness of information). Therefore, obtaining assurance only over integrity is not sufficient for engagement teams to conclude that data is sufficiently reliable.

Factors affecting the reliability of data

The reliability of data is mainly influenced by the following factors (Exhibit 2):

  • Source of data: Different data sources will require different approaches to evaluating reliability. Data can be from an internal or external source, and data can be integrated from a variety of sources.
  • Nature of the data: The nature of data, such as complex data, sensitive data, or classified data, will require different approaches to evaluating reliability.
  • Internal controls over data: These controls ensure that data contains all of the data elements and records needed for the engagement and reflects the data entered at the source or, if available, in the source documents.

Exhibit 2—Descriptions and examples of factors affecting the reliability of data

Factors Description Examples

Source of data

External source

While recognizing that exceptions may exist, data from credible third parties is usually more reliable than data generated within the audited organization. External information sources may include academics, researchers, and parliamentary committees.

Circumstances may exist that could affect the reliability of data from an external source. For example, information obtained from an independent external source may not be reliable if the source is not knowledgeable or if biases exist.

Internal source

Internal data refers to the private data collected within the organization. For example, engagement teams may intend to make use of the entity’s performance reports generated from its information system. Internal reports may be produced from different types of systems, and the information can be obtained from a variety of sources.

Examples of data that may be obtained from external data sources include

  • research papers
  • technical reports from industry experts

Examples of data that may be obtained from internal data sources include

  • information existing within the entity’s information systems

Nature of the data

The nature of data is also an important factor to consider. Data could be simple, highly complex, sensitive, or classified as secret.

When data is extremely complex, engagement teams may require the use of an expert to evaluate the reliability.

Internal controls over data

The reliability of data is influenced by the controls that data contains all of the data elements and records needed for the engagement and reflects the data entered at the soure or, if available, in the source documents.

Controls over data can be manual, automated, or IT-dependent manual controls.

For example, an IT-dependent manual control may exist where data entered into the system requires approval by an authorized individual before being processed.

The factors affecting the reliability of data will be considered by engagement teams to assess the risk that data is not reliable.

Determining the extent of the reliability assessment

Engagement teams should consider the elements below when determining the extent of the assessment:

  • the purpose for which the data is used
  • the significance of data as evidence
  • the risks that data is not reliable

There is no specific formula or recipe to precisely determine the extent of work to assess the reliability of data. Engagement teams need to consider and document the factors listed above and conclude on the basis of their professional judgment.

The purpose for which data is used

The purpose for which the data is to be used will affect the level of effort and the nature of audit evidence required (Exhibit 3).

Exhibit 3—Purpose for using data in direct engagements

Engagement teams need to devote considerably more effort when assessing the reliability of data to be used as inputs to audit procedures or to support audit findings and conclusions compared with data to be used as contextual or background information.

Audit procedures, audit findings, and conclusions

In accordance with OAG policy, when using data as an input to perform audit procedures or to support audit findings and conclusions, engagement teams shall evaluate whether the data is reliable for their intended purpose, including obtaining audit evidence about its completeness and accuracy. A reliability assessment is required when data is used for those 3 specific purposes.

Contextual or background information

Background information generally sets the tone for reporting the engagement results or provides information that puts the results in proper context. Data used in contextual or background information in the audit report is typically not evidence supporting the assurance conclusion.

Level of risk of using non-reliable data

Using non-reliable data would likely weaken the analysis and lead engagement teams to wrong audit findings or conclusions. The level of risk of data not being reliable should affect the nature and extent of testing required. Engagement teams should consider the factors affecting the data reliability to assess the risk of data not being reliable. As mentioned previously, these factors are the source of data, the nature of the data, and internal controls over data (Exhibit 4).

Exhibit 4—Factors affecting the reliability of data

Factor Generally more reliable 
Lower reliability risk 
Generally less reliable 
Higher reliability risk 

Source of data

Independent third party
(external source)

Entity
(internal source)

Nature of the data

Simple data

Highly complex

Internal controls over data

Existent related internal controls

Ineffective or inexistent related
internal controls

Procedures to assess the reliability of data

Engagement teams may use a variety of procedures to assess the reliability of data. The desired level of evidence can be obtained through tests of controls, substantive procedures, or a mix of both. On the basis of the level of assessed risk, engagement teams need to determine the most efficient approach to obtain the desired level of evidence.

Control testing

Control testing can be an excellent source of evidence. Engagement teams need to obtain an understanding of the internal controls, including the controls and processes of the information systems where data is maintained. Engagement teams will need to identify which controls are the most relevant for data reliability and assess whether the relevant controls have been designed and implemented appropriately. It is only after concluding on the design and implementation of the relevant controls that engagement teams can consider testing the effectiveness of those controls.

Manual and automated controls can both provide assurance over the reliability of data. However, since data is usually created, processed, stored, and maintained using information systems, engagement teams may have to consider testing the automated controls. Testing the operating effectiveness of the IT general controls (ITGCs) and application controls over the information systems in which the data is stored and maintained may be an effective and efficient approach to obtaining audit evidence over the completeness and accuracy of the data (Exhibit 5).

Exhibit 5—Controls that can provide assurance over the reliability of data

Control type Description
Automated controls Controls performed by IT applications or enforced by application security parameters.

IT general controls (ITGCs)

Policies and procedures that apply to all the entity’s IT processes that support the continued proper operation of the IT environment, including the continued effective functioning of information processing controls and the integrity of information (that is, the completeness, accuracy, and validity of information) in the entity’s information system.

Application controls

Application controls pertain to the scope of individual business processes or application systems and include controls within an application around input, processing, and output. The objective is to ensure that

  • inputs are accurate, complete, authorized, and correct
  • data is processed as intended
  • the data stored is accurate and complete
  • outputs are accurate and complete
  • records are maintained to track the process of data from input to storage and to the eventual output

An example of an application control is an input control. For example, in this scenario, companies are required to submit laboratory results electronically, showing the level of harmful substances released into the air and waterways, on a weekly basis. Input controls can restrict the fields that can be edited or uploaded electronically into the database. In this case, the system would generate the date when the company uploaded its laboratory results. Companies could not edit the date field. The engagement team could test the restrictions on electronic submissions to validate the input controls, ensuring data integrity. Further testing would be required to ensure the reliability of the data as described below.

Understanding the nature and source of the data will be important in determining whether ITGCs or application controls are relevant in aiding the auditor in gathering audit evidence over the data. Engagement teams may consult the IT audit specialist for the assessment of relevant ITGCs and application controls (also called information processing controls).

The Annual Audit Manual also includes useful information in section OAG Audit 5035.2 Identify the risks arising from the use of IT and the related ITGCs, and in section OAG Audit 5035.4 Information processing objectives. While not directly applicable to direct engagements, engagement teams can consult those sections to deepen their knowledge and understanding of the relevance of ITGCs or application controls in the engagement.

Spreadsheet—Control considerations

Spreadsheets can be easily changed and may lack certain control activities, which results in an increased inherent risk and error, such as the following:

  • input errors (errors that arise from flawed data entry, inaccurate referencing, or other simple cut and paste functions)
  • logic errors (errors in which inappropriate formulas are created and generate improper results)
  • interface errors (errors that arise from importing data from or exporting data to other systems )
  • other errors (errors that include inappropriate definitions of cell ranges, inappropriately referenced cells, or improperly linked spreadsheets)

Spreadsheet controls may include one or more of the following:

  • ITGC-like controls over the spreadsheet
  • controls embedded within the spreadsheet (similar to an automated application control)
  • manual controls around the data input and output of the spreadsheet

It is likely that the entity will not have implemented many, or any, ITGC-like controls over its spreadsheets that can be tested. In some cases, entities implement manual controls over their spreadsheets. As a result, engagement teams should focus on those manual controls over the data input into the spreadsheet and the output calculated by the spreadsheet.

Substantive testing

Engagement teams may use various substantive tests, as they can be an efficient and effective way to test the reliability of system-generated outputs. Substantive tests may include procedures such as tracing the outputs items to source documents or reconciling the outputs to independent, reliable sources.

Obtaining audit evidence over the completeness and accuracy of data can also involve tracing (such as individual facts, transactions, or events to and from source documents). This will help engagement teams to determine whether the data accurately and completely reflects these documents. Testing for completeness involves tracing from source documents to the systems or applications in which data is stored, and testing for accuracy involves tracing from the systems or applications back to source documents.

On the basis of the desired level of assurance sought, engagement teams may use substantive analytical procedures, accept-reject testing, or purposeful sampling to select a sample of data records to be traced to and from source documents (Exhibit 6).

Exhibit 6—Substantive tests to test the reliability of outputs

Type of substantive test Description

Substantive analytical procedures

Substantive analytics evaluates information through the analysis of plausible relationships among both financial and non-financial data. Analytical procedures also encompass a necessary investigation of identified fluctuations or relationships that are inconsistent with other relevant information or that differ from an engagement team’s expectation by a significant gap.

When using analytical procedures to test the reliability of data, engagement teams should consider using information that has already been audited or using third-party information. When engagement teams have no choice but to use unaudited data to build their expectations or to analyze a plausible relationship, the reliability of that data must be assessed before placing reliance on it.

The Annual Audit Manual includes useful information in section OAG Audit 7030 Substantive analytics. Engagement teams may consult this section to get more detailed information on substantive analytical procedures that might be relevant for their analyses.

Accept-reject testing

The objective of accept-reject testing, also referred to as attribute testing, is to gather sufficient evidence to either accept or reject a characteristic of interest. It does not involve the projection of a monetary misstatement in an account or population; therefore, accept-reject testing is used only when engagement teams are interested in a particular attribute or characteristic and not a monetary balance. When testing the underlying data of a report, engagement teams can apply accept-reject testing specifically to test the accuracy and completeness of the data included in the report. If more than the tolerable number of exceptions are identified, the test is rejected as not providing the desired evidence. The Annual Audit Manual also includes useful information in section OAG Audit 7043 Accept-reject testing, and in section OAG Audit 7043.1 A five step approach to performing accept-reject testing. While not directly applicable to direct engagements, engagement teams can consult those sections to deepen their knowledge and understanding of the accept reject testing and the 5-step approach to perform this type of test in their engagements.

The accept-reject testing covers the accuracy and completeness of the data as follows:

  • Accuracy: Select a sample of items from the data, and trace them to the underlying systems or applications for specific attributes (such as birth date or social insurance number).
  • Completeness: Select a sample (a different sample than selected for accuracy) of items from an independent source, and confirm that they are present in the data.

Purposeful testing

Purposeful testing involves selecting items to be tested on the basis of a particular characteristic. This is a preferred approach used by engagement teams, as it provides the opportunity to exercise judgment over which items to test. Purposeful testing can be applied to either a specific part of the data or the whole data. The results from purposeful testing are not projected to the untested items in a population. For more detailed information on purposeful selection, engagement teams can consult OAG Audit 6040 Selection of Items for Review.

Use technology to perform testing

Using technology to verify the completeness and accuracy of complex or large volumes of system-generated outputs can be an efficient audit approach. To interrogate data, computer-based tools such as IDEA or PowerBI may be used to conduct tests that cannot be done manually. Such testing is generally used to validate the accuracy and completeness of processing by the information systems that generated the data.

The following are examples of substantive tests that can be done using technology:

  • finding and following up on anomalies such as outliers associated with a specific geographic location
  • replicating the system-generated output by running your own independent queries or programs on the actual source data (that is, reperformance of the report)
  • evaluating the logic of the programmed system-generated output or ad hoc query, by perhaps
    • inspecting application system configurations
    • inspecting vendor system documentation
    • interviewing program developers (usually not enough by itself)

Engagement teams will still need to test the reliability of the source data included in the system-generated output.

A variety of programming skills are needed, depending on the scope of what will be tested—from producing cross-tabulations on related data elements to duplicating sophisticated automated processes using more advanced programming techniques. Consult with internal specialists as needed.

Data integrity checks

The integrity of data refers to the accuracy with which data has been extracted from the information system (source), transferred to engagement teams, and transformed into an appropriate format for engagement teams to use. Data integrity checks alone are not sufficient to assess the reliability of data.

Data integrity checks include checking for

  • out-of-range values, such as dates outside of scope
  • invalid values
  • the total number of records provided against record count
  • the total amount of a field provided against control totals
  • gaps in fields or missing data, such as missing dates, locations, regions, identification numbers, registration numbers, or other
  • missing values
  • duplicate records

Other data integrity checks include

  • looking for unexpected aspects of the data—for example, extremely high values associated with a certain geographic location
  • testing relationships between data elements, such as whether data elements correctly follow a skip pattern from a questionnaire
  • ensuring consistency with other source information (such as public or internal reports)

Data used by a practitioner’s expert or entity’s expert

Where the work of an expert is relevant to the context of an engagement, the reliability of data provided to the expert should be evaluated. Where a practitioner’s expert has been engaged, the practitioner should be directly involved in and knowledgeable about the data and other information provided to the expert. Where the entity engages an expert directly, the data and other information necessary for the expert to perform the work will often be provided without the practitioner’s direct involvement or knowledge. As such, when using the work of an entity’s expert, engagement teams should perform additional procedures to identify and assess the data.

For example, where an engagement team plans to use the work of an actuary engaged by the entity, the engagement team should ask management and the actuary to identify all relevant data and information used by the expert and, where appropriate, test the underlying completeness and accuracy of such data and information. For example, census data used by an actuary are typically extracted from the entity’s payroll or personnel information systems; therefore, testing the completeness and accuracy of the data is necessary to evaluate the reliability of the source data.

For more information on the use of experts, refer to OAG Audit 2070 Use of experts.

Outcomes of the assessment and documentation

Outcomes of the assessment

After the assessment of data reliability, engagement teams need to analyze the result of the work performed and conclude on whether data is reliable or not. The result of the assessment may provide engagement teams with assurance that data is deemed to be reliable or unreliable

Data is reliable when engagement teams obtain reasonable assurance on the completeness and accuracy of data, and are willing to accept the extent of errors (errors are present in the data) and the risk associated with using the data. The greater the number of significant errors in key data elements, the higher the likelihood of data being unreliable and useless for the purposes of engagement teams.

Documentation

The audit logic matrix for the engagement should briefly address how an engagement team plans to assess the reliability of data, as well as any limitations that might be in place because of errors in the data. All work performed as part of the data reliability assessment should be documented and included in the engagement workpapers. This includes all testing, information reviews, and interviews related to data reliability. In addition, decisions made during the assessment, including the final assessment of whether the data is sufficiently reliable for the purposes of the engagement, should be summarized and included with the workpapers.