Resources

The Data Quality Metrics (DQM) project leverages the data quality harmonization framework (Kahn, 2016) to implement a new platform that enables standardization of data quality metrics and assessment and visualization of data quality output. This leading theoretical framework categorizes disparate DQ terms and findings using standard vocabulary that provides the foundation and standards that can be used in new tools, like this DQM system.

  • Kahn et al. 2016: Data Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data
  • Resulting DQ Framework: Existing DQ terms and concepts harmonized and organized into a framework of categories and subcategories as illustrated in the Table below
    • DQ Categories and Subcategories
      • Conformance: Do data values adhere to specific standards and formats?
        • Subcategories:
          • Value Conformance
          • Relational Conformance
          • Computational Conformance
      • Completeness: Are data values present?
      • Plausibility: Are data values believable?
        • Subcategories:
          • Uniqueness Plausibility
          • Atemporal Plausibility
          • Temporal Plausibility
    • DQ Assessment contexts:
      • Verification
      • Validation

About

The Data Quality Metrics system is a pilot project that establishes a set of standardized data quality metrics that can be used to compare data sources to each other. This allows researchers to better understand candidate data sources before querying and analyzing them. The system addresses the following goals:

  • Develop a web-based tool that operationalizes the leading theoretical data quality harmonization framework (Kahn et al, 2016) into a set of standard data quality metrics and a tool to author new metrics
  • Create a beta version of the platform with Sentinel and PCORnet as use cases
  • Collaborate with existing DQ stakeholder community and incorporate feedback on tools developed
  • Create open source tools (web application, flexible data model, visualization templates) that illustrate data quality metric authoring, capture of data quality metric output (measures), and evaluation and visualization of supplied measures.

Funding

  • This pilot project was funded by the US Food and Drug Administration through the Department of Health and Human Services (HHS) Contract number HHSF223201400030I / HHSF22301006T.
  • Data Quality Metrics is an FDA Sentinel project funded by the Department Health and Human Services (HHS) Office of the Assistant Secretary for Planning and Evaluation (ASPE).
  • ASPE, via the Patient-Centered Outcomes Research Trust Fund (PCORTF), has funded multiple HHS-agency sponsored projects focused on building the infrastructure necessary to support robust real-world evidence generation. This project is funded via FDA to develop a framework to systematically assess data quality across data sources and distributed data networks.

Four stakeholder sessions were held in September 2019 to demonstrate a beta-version of the software. The sessions addressed the following topics:

  1. Demonstration and discussion related to authoring data quality metrics - these two sessions were targeted to stakeholders that are interested in the creation and discussion of metrics that can be utilized for multiple data sources and research questions.
    Click to view or hide the desired video:
  2. Demonstration and discussion regarding exploring database fingerprints and submission of measures - these two sessions were targeted to stakeholders that are interested in evaluating fitness for use of various data sources or for various research questions.
    Click to view or hide the desired video:

We are utilizing Service Desk tickets to enable continued discussion among community members. If you have any feedback or suggestions, please create a ticket at Data Quality Metrics Community Board .

Data Quality Metrics Stakeholder Session - 2019.09.03

Data Quality Metrics Stakeholder Session - 2019.09.04

Data Quality Metrics Stakeholder Session - 2019.09.06 am

Data Quality Metrics Stakeholder Session - 2019.09.06 pm

How to author a Metric
  • Navigate to the Metrics page to review existing metrics

  • To submit a new metric, click “Author a Metric” and begin by entering a brief description of the metric. You can then select the Results Type, Domain, and DQ Harmonization Category from the drop-down menus.
    ** Note: You must be logged in and have permission to "Author Metrics" to see the menu item and button. If you do not see either, please enter a request at the Service Desk for permission. **
  • A list of similar existing metrics will populate the panel below based on the information entered for you to review. Please confirm that this is a new metric and not a duplicate of an existing metric.
  • Click “Save and Continue” to move to the Metrics Details form and fill out the following fields:
    • Description – details on the purpose of the metric (required)
    • Justification – additional context or reasoning for the creation of the metric (required)
    • Expected Results - description of what the author is expecting as a result of executing the metric against a data source (required)
    • Jira # for public comments – a ticket will be created to enable discussion on the specific metric
    • Support Documents - upload any supporting documentation and/or examples.
  • Once the details of the metric have been filled in, select “Save and Continue”
  • On the Metric Summary page, choose to either “Submit for Review” or “Save Draft”. You will be able view all of your submitted and draft metrics on your Dashboard.
  • For more information, please see the “Authoring Data Quality Metrics” stakeholder video session or reach out via Data Quality Metrics Community Board .

How to submit Measures via the Portal or the API
Using a Template Via the Portal

Measures are the quantitative representation of a Metric. A measure submission is comprised of metadata describing attributes of the dataset. These attributes describe things like the associated metric, Organization and DataSource, the run date, and dataset date range. A measure submission can be in the format of a Microsoft Excel or json document; a template including the metric identifier and expected results type can be downloaded from the metric's detail page.

A measure submission is comprised of the following:

Metadata:
MetricID The metric identifier which can be obtained from the metric details page.
OrganizationID The GUID of the Organization which can be obtained from CNDS (optional).
Organization The name of the Organization.
DataSourceID The GUID of the DataSource which can be obtained from CNDS (optional)
DataSource The name of the DataSource.
Run Date The date the data for the Measure was obtained.
Network The network the DataSource/Organization belong to (optional).
Common Data Model The Common Data Model (optional).
Common Data Model Version The Version Number of the Common Data Model (optional)
Database System Information about the database the data was queried from (optional).
Date Range Start The starting date the dataset encompasses, must be earlier than the Date Range End.
Date Range End The ending date the dataset encompasses, must be greater than the Date Range Start.
Results Type The Results Type of the metric, must be a valid results type and match the results type of the metric the measure is for.
Results Delimiter The Delimiter within the Results (optional)
Supporting Resources A url to a location containing any supporting resources, code and/or documentation, used to execute the query for this Measure.
Measure:
Raw Value The predefined value-set. For example, a SEX value set may contain the following: “M”, “F”, “A”, “OT”
Definition Descriptive text for the raw values. Following the above example, the definition for each raw value would be: “Male”, “Female”, “Ambiguous”, and “Other” respectively.
Measure Based on the result type (count vs. percentage); result or answer to the metric of interest.
Total Overall count/percentage of Measures

All dates must be formatted in the "year-month-day" numeric format: yyyy-MM-dd. All dates will be treated as local, and displayed as local without any timezone information taken into account. The excel template contains definitions for each metadata attribute and are applicable to the json template.

Excel template:

  • Tab 1 contains the metadata for the measure submission. Tab 2 the measure dataset.
  • The format of the document must stay the same as the template.

Json template:

  • The same required properties as the Excel template must be completed.
  • The names of the properties are the same as the Excel template except without spaces.

After completing the template, navigate to Submit Measures and follow the instructions to upload the measure submission. Keep a copy of your measure submission template, the uploaded files are not stored by DQM after being processed.
Details about your submissions can be found on your Dashboard.
If you need a submission removed please enter a request at the Data Quality Metrics Community Board .

Submitting Directly to the API

Measures can be submitted directly to the API via HTTP POST to the "/api/measures/submit" endpoint. The endpoint will validate the user based on credentials specified in the Authorization header attribute of the HTTP request. The credentials will be Base64 encrypted and in the Basic format, standard Basic Authentication process.

  1. The user is authenticated and confirmation of the Submit Measures permission is performed.
  2. The body of the post is assumed to be the json version of the measure submission template.
  3. If the submission does not pass content validation and error response is returned and the submission terminated.
  4. If validation is successful the measure submission is persisted to the database, and a success response is returned.

When submitting directly to the API only json is accepted.

C# Sample Posting a JSON File
using (var http = new System.Net.Http.HttpClient())
{
    http.DefaultRequestHeaders.Accept.Add(System.Net.Http.Headers.MediaTypeWithQualityHeaderValue.Parse("application/json"));
    http.DefaultRequestHeaders.Authorization = new System.Net.Http.Headers.AuthenticationHeaderValue("Basic", Convert.ToBase64String(Encoding.Default.GetBytes("{username}:{password}")));

    try
    {
        using (var stream = System.IO.File.OpenRead(@"C:\path\to\MeasureSubmission.json"))
        {
            var content = new System.Net.Http.StreamContent(stream);
            content.Headers.Add("Content-Type", "application/json");

            var result = await http.PostAsync("https://dataquality.healthdatacollaboration.net/api/measures/submit", content);

            if (!result.IsSuccessStatusCode)
            {
                string error = await result.Content.ReadAsStringAsync();
                System.Diagnostics.Debug.WriteLine(error);
            }
        }
    }
    catch (System.Net.WebException webex)
    {
        using (var reader = new StreamReader(webex.Response.GetResponseStream()))
        {
            System.Diagnostics.Debug.WriteLine(await reader.ReadToEndAsync());
        }
    }
}

Open Source Software

The Data Quality Metrics (DQM) system was implemented and made available as open source technology. The open source code for the DQM system was posted on GitHub in a repository with accompanying technical and user documentation for public access found here: https://github.com/PopMedNet-Team/DataQualityMetrics

FHIR

The data structure for DQ metrics and measure results, later referred to as ‘payload’, was codified to a common format that is not data model specific and allows for application portability and interoperability. JSON was selected as the language to express the metrics and results, though XML, BSON, or the next new flavor of structured data formatting would have been other options. Additionally, we have investigated the potential of leveraging parts of the data structure defined by the Fast Healthcare Interoperability Resources (FHIR) standards . The FHIR standards are utilized for the transfer of electronic healthcare information based on existing logical models and can be extended for specific purposes. While this project does not formally use FHIR services, there may be opportunities in the future to structure the DQ payload in ways that align with current FHIR data structures.

Qlik Sense was selected as the visualization tool for users to explore the characteristics of data sources. Qlik can connect to data sources using standard APIs, and the assumption is that other analytic tools able to load data via an API (e.g. Tableau) could be used in place of Qlik. Technical documentation on Qlik and the available APIs are posted in the GitHub repository: https://github.com/PopMedNet-Team/DataQualityMetrics

To see current Qlik visualizations for DQM, please navigate to Explore DQM.

Through exploration of existing practice and published work, the project team developed the Data Quality Data Model that underlies the DQM system and captures items of interest (metadata) describing the source system, its measures, and each metric. The data model, database diagram, and other relevant documentation can be found in the DQM GitHub repository here: https://github.com/PopMedNet-Team/DataQualityMetrics