What Are Data Standards? – In the context of health care, the term data standards encompasses methods, protocols, terminologies, and specifications for the collection, exchange, storage, and retrieval of information associated with health care applications, including medical records, medications, radiological images, payment and reimbursement, medical devices and monitoring systems, and administrative processes (Washington Publishing Company, 1998).
Definition of data elements —determination of the data content to be collected and exchanged. Data interchange formats —standard formats for electronically encod-
Suggested Citation: “4 Health Care Data Standards.” Institute of Medicine.2004. Patient Safety: Achieving a New Standard for Care, Washington, DC: The National Academies Press. doi: 10.17226/10863. × ing the data elements (including sequencing and error handling) (Hammond, 2002).
Terminologies —the medical terms and concepts used to describe, classify, and code the data elements and data expression languages and syntax that describe the relationships among the terms/concepts. Knowledge Representation —standard methods for electronically representing medical literature, clinical guidelines, and the like for decision support.
At the most basic level, data standards are about the standardization of data elements: (1) defining what to collect, (2) deciding how to represent what is collected (by designating data types or terminologies), and (3) determining how to encode the data for transmission.
The first two points apply to both paper-based and computer-based systems; for example, a laboratory test report will have the same data elements whether paper or electronic. A data element is considered the basic unit of information, having a unique meaning and subcategories of distinct units or values (van Bemmel and Musen, 1997).
In computer terms, data elements are objects that can be collected, used, and/or stored in clinical information systems and application programs, such as patient name, gender, and ethnicity; diagnosis; primary care provider; laboratory results; date of each encounter; and each medication.
- Data elements of specific clinical information, such as blood glucose level or cholesterol level, can be grouped together to form datasets for measuring outcomes, evaluating quality of care, and reporting on patient safety events.
- Associated with data elements are data types that define their form.
- Simple data types include date, time, numeric, currency, or coded elements that rely on terminologies (Hammond, 2002).
Examples of complex data types are names (a structure for names) and addresses. For comparability and interchange, data types must be universal and must be carried through all uses of the data. The designation of common scientific units is also necessary.
Units (e.g., kilograms, pounds) must be specified as another measure to prevent adverse events such as those related to dosing errors. Until recently, each institution or organization defined independently the data it wished to collect and the units employed, did not use data types, and created local vocabularies, resulting in fragmentation that prevented reuse.
For data elements that rely on terminologies and their codes for definition, merely referencing a terminology alone does not provide enough speci- Suggested Citation: “4 Health Care Data Standards.” Institute of Medicine.2004. Patient Safety: Achieving a New Standard for Care,
|Other Data Sources for Patient Safety Information
| Histories Allergies Immunizations Social histories Vital signs Physical examination
Physicians’ notes Nurses’ notes
Laboratory tests Diagnostic tests Radiology tests Diagnoses Medications Procedures Clinical documentation Clinical measures for specific clinical conditions Patient instructions Dispositions Health maintenance schedules
|Policies and procedures Human resources records Materials management systems Time and attendance records Census records Decision support alert logs Coroners’ datasets Claims attachments Admissions data Disease registries Discharge data Malpractice data Patient complaints and reports of adverse events Reports to professional boards Trigger datasets (e.g., antidote drugs for adverse drug events) Computerized physician order entry systems Bar-code medication administration systems Clinical trial data
Suggested Citation: “4 Health Care Data Standards.” Institute of Medicine.2004. Patient Safety: Achieving a New Standard for Care, Washington, DC: The National Academies Press. doi: 10.17226/10863. ×
|Patient Safety Datasets and Taxonomies
|Federal Reporting Systems Datasets
| Eindhoven classification taxonomy Near misses (development needed) Adverse events (development needed) Accreditation reporting dataset (Joint Commission on Accreditation of Healthcare Organizations ) Medical Specialty Society—such as
Trauma/emergency Surgery Anesthesia Radiology Family practice Pediatrics
Medical Event Reporting System for Transfusion Medicine (MERS TM) United States Pharmacopea (USP) National Coordinating Council for Medication Error Reporting and Prevention (NCC MERPS) MedMarx (by USP for medication events) Emergency Care Research Institute (ECRI)
States with mandatory reporting systems
Colorado California Connecticut Florida Georgia Kansas Massachusetts Maine Minnesota New Jersey New York Nevada Ohio Pennsylvania Rhode Island South Carolina South Dakota Tennessee Texas Utah Washington Oregon (voluntary system)
| Agency for Healthcare Research and Quality
Prevention Quality Indicators (PQI) Quality Indicators for Patient Safety (QIPS)
Centers for Disease Control and Prevention
National Electronic Disease Surveillance System (NEDSS) Dialysis Surveillance Network (DSN) Vaccine Adverse Event Reporting System (VAERS) Vaccine Safety Datalink (VSD) National Nosocomial Infection Surveillance System (NNIS) National Center for Health Statistics (NCHS)
Centers for Medicare and Medicaid Services
Medicare Patient Safety Monitoring System (MPSMS) Minimum Data Set (MDS) for Nursing Home Care End-stage renal disease (ESRD) Outcome and Assessment Information Set (OASIS) for Home Care
Food and Drug Administration
Adverse Event Reporting System (AERS) Manufacturer and User Data Experience (MAUDE) Special Nutritionals Adverse Event Monitoring System (SNAEMS) Biological Product Deviation Reporting System (BPDR/BIODEV) Medical Product Surveillance Network (MedSun) MedWatch (postmarket surveillance)
Nuclear Regulatory Commission
Suggested Citation: “4 Health Care Data Standards.” Institute of Medicine.2004. Patient Safety: Achieving a New Standard for Care, Washington, DC: The National Academies Press. doi: 10.17226/10863. × ficity. To ensure data comparability, specific codes must be identified within each terminology set to represent the data elements.
This becomes a major issue for some of the larger clinical terminologies, which may have hundreds or thousands of terms. It is also a major issue given the amount of data that must be collected for the data sources and requirements listed in Table 4-1 and that will be encompassed by the national health information infrastructure (NHII).
Common data standards are essential to simplify and streamline data requirements and allow the information systems that carry the data to function as an integrated enterprise.
What are data elements used for in healthcare?
A common data element (CDE) is a standardized, precisely defined question that is paired with a set of specific allowable responses, that is then used systematically across different sites, studies, or clinical trials to ensure consistent data collection.
What is a data set in healthcare?
Healthcare data sets include a vast amount of medical data, various measurements, financial data, statistical data, demographics of specific populations, and insurance data, to name just a few, gathered from various healthcare data sources.
What are examples of data elements?
data element – Definition(s): A basic unit of information that has a unique meaning and subcategories (data items) of distinct value. Examples of data elements include gender, race, and geographic location. Source(s): CNSSI 4009-2015 The smallest named item of data that conveys meaningful information. Source(s): NIST Privacy Framework Version 1.0 under Data Element
Why are structured data elements important in EHR?
An expert panel identified and assessed electronic health record and health information exchange structured data elements to support future development of social risk factor computable phenotyping. ABSTRACT Objectives: Computable social risk factor phenotypes derived from routinely collected structured electronic health record (EHR) or health information exchange (HIE) data may represent a feasible and robust approach to measuring social factors.
This study convened an expert panel to identify and assess the quality of individual EHR and HIE structured data elements that could be used as components in future computable social risk factor phenotypes. Study Design: Technical expert panel. Methods: A 2-round Delphi technique included 17 experts with an in-depth knowledge of available EHR and/or HIE data.
The first-round identification sessions followed a nominal group approach to generate candidate data elements that may relate to socioeconomics, cultural context, social relationships, and community context. In the second-round survey, panelists rated each data element according to overall data quality and likelihood of systematic differences in quality across populations (ie, bias).
- Results: Panelists identified a total of 89 structured data elements.
- About half of the data elements (n = 45) were related to socioeconomic characteristics.
- The panelists identified a diverse set of data elements.
- Elements used in reimbursement-related processes were generally rated as higher quality.
Panelists noted that several data elements may be subject to implicit bias or reflect biased systems of care, which may limit their utility in measuring social factors. Conclusions: Routinely collected structured data within EHR and HIE systems may reflect patient social risk factors.
Identifying and assessing available data elements serves as a foundational step toward developing future computable social factor phenotypes. Am J Manag Care.2022;28(1):e14-e23. https://doi.org/10.37765/ajmc.2022.88816 _ Takeaway Points Computable phenotypes are measurements of patient conditions or characteristics that can be obtained from existing data by combining a defined set of variables and logical expressions.
Routinely collected structured data within electronic health records and health information exchange systems are reflective of characteristics of social and economic well-being and thus may be amenable to use in social risk factor phenotype development.
Computable phenotypes represent an additional method of measuring patient social factors that leverages existing data sources and workflows.Structured data elements used in reimbursement-related processes may be of the highest quality for use in phenotyping.Currently collected structured data elements such as International Classification of Diseases, Tenth Revision Z codes and Logical Observation Identifiers Names and Codes are potentially susceptible to bias.Computable phenotyping will require transforming or combining data elements into novel and potentially more informative measures.
_ Social risk factors include patients’ nonclinical, economic, and contextual characteristics that may adversely affect health.1-3 As important drivers of morbidity, mortality, utilization, and health care costs, social risk factors are important for health risk assessment and both individual and population health management.4,5 Specifically, social risk factor information may improve risk prediction models 6,7 and identify patients in need of social services.8 Because of the potential value of social risk factor information, federal agencies, clinical organizations, and health system experts advocate for better collection and use of patient social risk factor information.9 Despite the potential value of social risk factor information to individual patient care and population health management activities, health care organizations’ current methods to measure this information are fraught with challenges.
Patient-facing social risk questionnaires have not been consistently validated, 10 diagnostic codes such as International Classification of Diseases, Tenth Revision ( ICD-10 ) Z codes are underutilized, 11-13 area-level measures (eg, zip code–level demographics) can mask heterogeneity across individuals and are prone to the ecological fallacy, 14,15 and extracting free-text clinical documentation from electronic health records (EHRs) remains difficult for many organizations.16,17 As a result, any one of these methods may not be sufficient for health care organizations to collect the information necessary to make inferences about patients’ and populations’ social risk factors.
Computable social risk factor phenotypes derived from routinely collected structured EHR or health information exchange (HIE) data may be an alternative approach to measuring social factors.18 Computable phenotypes are representations of patient conditions or characteristics that can be obtained from EHR data by combining a defined set of variables and logical expressions.19-21 Data such as demographics, insurance information, billing histories, appointment status, emergency contacts, and language preferences exist in most EHRs.
Although such data may not contribute to understanding patients’ clinical status, they are reflective of characteristics of social and economic well-being and thus may be amenable to use in social risk factor phenotypes. Additionally, using structured data elements already routinely collected as part of clinical and business operation workflows may mitigate the challenges of underutilization of screening surveys and diagnosis coding, additional data collection burden, and the technical implementation hurdles of natural language processing (NLP) for textual data.
Moreover, when applied to HIE data, which combine patient data across organizations over time, robust computable social risk factor phenotypes may be constructed that reduce missing data challenges 22,23 and increase explanatory power.19 However, biomedical informatics and health services research have devoted little attention to the potential value of developing phenotypes from existing structured data for social risk factor measurement in favor of questionnaires, area-level data linkage, and NLP.18,24 From existing work on computable phenotypes, we know that poor data quality 25 and data that are inconsistently collected across patient populations may result in inaccurate and otherwise biased phenotypes.26 Therefore, as a foundational step, we convened an expert panel to identify and assess the quality of individual EHR and HIE data elements that might be useful, accurate, and unbiased components to include in future computable social risk factor phenotype development.
Our work sets the stage for the future quantitative development of computable social risk phenotypes by providing expert insight to guide selection of candidate data elements. MATERIALS AND METHODS We used a 2-round Delphi technique to identify and preliminarily evaluate structured data elements as candidates for use in future computable phenotype development.27,28 Expert Panel Formation We recruited 17 individuals (of 18 invitations) with in-depth knowledge of EHR and/or HIE data based on publications or practice experience in 1 or more of the following 3 areas: EHR or HIE technology management in a health care organization; clinical or operational practice that involved data collection; or EHR or HIE research.
The majority of respondents were affiliated with research or academic medical institutions (n = 14) and the remainder were individuals in leadership roles at health information technology organizations (n = 3). The panel represented organizations located on the East Coast and West Coast and in the South and Midwest.
- Five of the panel members were physicians.
- Expert panel members received a financial incentive of $250 for participating in the focus group and follow-up survey.
- Panelists were split across 2 identification sessions that each followed a common protocol.
- In advance of the identification sessions, we provided each panelist with a summary of the research objectives of identifying and assessing potential structured data elements, a description of the expectations, and a shared definition of social risk factors (ie, any patient-level nonclinical economic, contextual, and psychosocial characteristics and factors).
Because computable phenotypes are useful if generalizable, 21 we instructed panelists to focus on structured data elements that they would expect to be commonly available in EHR or HIE data. We asked panelists to exclude unstructured data elements (eg, clinical note or other text data), data elements requiring linkage to sources outside of typical EHR or HIE systems (eg, tax records), and patient-facing social risk factor questionnaires because those are not widely adopted.
- These restrictions were intended to prioritize potential structured data elements that would be widely available.
- Panelists’ ideas were not restricted to specific age groups.
- We provided this information to each panel member during a short preparatory phone call in advance of the identification sessions.
Round 1: Identification The study team (consisting of authors J.R.V., H.E.K., C.M., and C.A.H.) conducted two 90-minute group identification sessions (n = 8 and n = 9) via videoconference. In each of these sessions, we followed a nominal group approach in which each panelist, in turn, was asked to suggest a data element, until all ideas were exhausted.
A research team member documented the ideas generated in real time and displayed them on screen during the session. To help organize the idea generation, when suggesting data elements, panelists were asked to categorize them into 1 or more broad categories of social risk factors 1,29,30 : socioeconomic status (eg, employment, financial, food insecurity/hunger, housing instability); cultural context (eg, language, health literacy); social relationships (eg, social support, incarceration); and community context (eg, housing quality, transportation, safety/violence).
The 2 identification sessions were treated as independent (ie, findings from the first were not shared with the second). The research team deduplicated suggested data elements from across the 2 panels. Round 2: Assessment Panelists were asked to rate data elements identified in round 1 based on 2 characteristics (defined below): quality and likelihood of systematic differences across demographic groups.
The purpose of this rating exercise was to begin evaluating the feasibility and appropriateness of real-world EHR or HIE data for future computable phenotypes. The rating survey was conducted using REDCap.31,32 Before administering, 4 core research team members and 2 panelists pilot tested the survey.
Is the data element high quality ? We defined high-quality data elements as those that were concurrently complete, accurate, and up to date in an EHR or HIE system. These 3 dimensions are common components of data quality frameworks.33,34 Panel members rated each element on a 5-point scale from poor quality to excellent quality ( eAppendix A ).
What is the likelihood of systematic differences in data quality? Differential data quality across patient demographic groups (eg, race/ethnicity, gender, age, sexual orientation) can lead to inaccurate and biased social risk factor measurement, risk prediction, and population health management activities.
Systematic differences in quality could be due to a lack of patient diversity, differential work processes, structural barriers to care, or broader societal conditions.35 Panel members rated each element on a 5-point scale from extremely unlikely different data quality to extremely likely different data quality.
- Finally, to better understand each panel member’s frame of reference when completing the survey, we included a single item to gauge if responses were rooted in experiences with data from hospital settings, physician/group practice settings, and/or HIE systems.
- Analyses Analyses were divided by the identification and assessment phases of data collection.
First, we determined counts of identified potential data elements by social risk factor category. We also determined data elements suggested in each social risk category during the identification sessions. These sessions also generated group discussion on potential risks and limitations of each data element, which we summarized.
- Next, we computed frequencies and percentages to describe panelists’ ratings of each data element during the assessment portion of the panel.
- We created 2-way scatterplots to illustrate the plurality of panelists who responded at the 2 extremes of the respective scales (eg, top 2 box approach).36 The plots help identify those factors that they generally perceived to be of higher data quality and also as less likely to have systematic differences across groups.
We plotted the data elements for each social factor category separately. To facilitate visualization, we labeled reported data elements as billing and payment, diagnoses and clinical data, encounters and appointments, identifiers and contact information, language, referrals and orders, social determinants of health codes, and other.
- The full distribution of responses for every data element is presented in eAppendix B,
- To examine consistency in ratings across panel members, we also grouped average data element ratings by social factor category and stratified by panel member type (physician or nonphysician) and primary frame of reference when answering questions (EHR or HIE) ( eAppendix C ).
RESULTS Identification: Potential Data Elements (identification session) Across the 2 identification sessions, panelists generated a total of 89 structured data elements ( Table ). However, due to the cooccurring nature of social risk factors, several suggested data elements pertained to multiple categories.
- About half of the identified elements (n = 45 of 89) were relevant to the socioeconomic status category.
- Within the socioeconomic status category, most data elements were suggested as relevant to financial status followed by employment, food insecurity/hunger, and housing instability.
- The socioeconomic category also included several data elements that we considered to be general indicators of socioeconomic-related needs.
Candidate data elements reflected billing, identifiers, orders, and utilization data. Identification: Observations on Concerns, Considerations, and Limitations Regarding Data Elements (identification session) During identification, panelists recognized multiple limitations related to structured data elements.
- These concerns included the potential for inherently biased data elements, inconsistent data collection processes, potential variation across patient populations, and the limitations of area-level measures.
- For example, a panelist noted that discrimination occurs in the care delivery processes and that underserved populations face barriers in accessing services, which could lead to biased data.
Similarly, another panelist noted that credit scores may be very predictive of patients’ financial risks and needs, but this data element is known to be biased by race. Similarly, another noted that inconsistent data collection also limited the usefulness of some data elements.
As one panelist stated about documenting homelessness, “There are some ICD codes that nobody uses.” Another panelist agreed with the limited adoption, but he noted that “the ICD code is going to be very specific when used.” Panelists noted that computable phenotypes may need to be developed for different patient populations.
For example, some data elements could be relevant for adults but would not be relevant for pediatric populations. Alternatively, a phenotype could prove useful for patients with high health care utilization only. Finally, one panelist noted the “poor overlap” between area-level measures and patients’ self-reported social risk factors.
Assessment: Perceptions of Data Quality and Likelihood of Systematic Differences in Quality When responding to the assessment surveys, most (n = 12 of 17) of the panel members reported primarily thinking about data that come from HIE systems. Clinicians and nonclinicians did not vary substantially in their assessments of data quality and likelihood of systematic differences in data quality across populations.
Panelists’ assessment of the likelihood of systematic differences in data quality across populations did vary based on whether they reported primarily thinking of HIE systems vs EHR systems. Those who reported thinking primarily about HIE systems most frequently reported that quality was likely to be different across populations for the socioeconomic, social relationship, and community context categories (eAppendix C).
- Socioeconomic Status Data Element Assessment Data elements that are both higher quality and unlikely to have systematic differences across populations are preferable.
- For those data elements suggested by the panelists as relevant to socioeconomic status, only identifier and contact information and billing and payment-related data elements were frequently rated as “very good” or “excellent” quality and at the same time rated as also “unlikely” or “extremely unlikely” to have differential quality across patient groups ( Figure 1 ).
These elements included date of birth, last name, address, insurance type, bills in collection, payment method, days in accounts receivable, and outstanding bills. Many more data elements were generally viewed as low quality (ie, “fair” or “poor”). Notably, panelists rated the ICD-10 Z and Logical Observation Identifiers Names and Codes (LOINC) codes that represent various social risk factors, several data elements related to referrals to specific social services and providers, and inability to do telehealth visits as lower quality (ie, “fair” or “poor”) and simultaneously “likely” or “extremely likely” to have differential quality across patient groups.
- Cultural Context Data Element Assessment Panelists rated only address and EHR portal account presence and usage as high quality (ie, “very good” or “excellent”) ( Figure 2 ).
- Panelists also considered these elements to generally be “unlikely” or “extremely unlikely” to have differential quality across patient groups.
Conversely, more than half of panelists rated presence of advance directives, language of discharge instructions, primary language, and the need and use of interpreters as low quality (ie, “fair” or “poor”) and as “likely” or “extremely likely” to have differential quality across patient groups.
- Again, ICD-10 Z codes and similar LOINC codes related to education and literacy were rated of low quality and likely to have different data quality across populations.
- Social Relationships Data Element Assessment Panelists rated few social relationship data elements as high quality overall (ie, “very good” or “excellent”) ( Figure 3 ).
The majority of panelists rated social relationship–relevant ICD-10 Z and LOINC codes as low quality and “likely” or “extremely likely” to have differential quality across patient groups. Community Context Data Element Assessment In the community context domain ( Figure 4 ), some data elements associated with identity and contact information, diagnoses and clinical data, and encounters and appointments, such as address, arrival by ambulance, and emergency department visits associated with trauma or injury, were viewed by more panelists as higher quality and “unlikely” or “extremely likely” to have differential data quality across populations.
Other diagnoses and clinical data were also considered “unlikely” or “extremely likely” to have differential data quality across populations but were nevertheless viewed as having “fair” or “poor” data quality. Again, panelists rated community context–relevant ICD-10 Z and LOINC codes as “likely” or “extremely likely” to have differential data quality across populations and to be of poorer data quality.
DISCUSSION Our panel of 17 EHR and HIE data experts identified and commented on routinely collected structured data elements for potential use in the future development of computable social risk factor phenotypes. Panelists highlighted several specific concerns about overall data quality and the potential for systematic quality differences across populations that may lead to bias and other data inaccuracies.
- This novel and foundational work can be used to help develop future computable phenotypes for social factors.
- Data quality (defined in this study as complete, accurate, and up to date) is a long-standing concern in biomedical informatics, particularly when data are used for purposes other than those for which they were originally collected.37 Panelists generally perceived data elements of the highest quality to be those used in reimbursement-related processes (eg, identifiers and contact information, billing and payment-related data, diagnoses), which is consistent with prior studies.38,39 In addition, panelists reported that these data elements were among those less likely to be systematically different in quality across populations.
Given these advantageous qualities, reimbursement-related data elements may be viable candidates for use in computable phenotypes development. In general, the most consistently identified quality concerns related to structured data elements that have been designed to document social risk factors: ICD-10 Z and LOINC codes.
Evidence indicates that these codes are substantially underutilized in practice.11-13 Not only did the expert panel results question the quality of these data, but they also indicated that these were among the most likely to have different data quality across populations. These perceptions align with a recent quantitative analysis indicating that ICD-10 Z codes are a specific indicator of social risk, but one that is collected in a biased fashion.40 Increasing adoption, explicit reimbursement for documenting social needs, or the mapping of screening questionnaires to these standards 41 may eventually increase the utility of these data elements.
However, currently their application to computable phenotyping or other measurement activities appears limited. Poor-quality data can undermine any application to care delivery. However, when data quality is systematically different across populations, the risks increase that any measurement strategy, including computed phenotypes, could perpetuate societal biases and inequitable practices in health care.42,43 Related applications of health data have demonstrated the risk of drawing the wrong inferences about patients.
For example, a widely utilized risk stratification tool systematically recommended healthier White patients over sicker Black patients for care management programs, because it failed to account for differential levels of access.44 Similarly, disease risk models developed in homogenous majority populations do not perform well for minority groups.45 The future development of computable social risk factor phenotypes will require attention to the risks of biased and differential quality data because the processes for collecting health care data are highly variable.46,47 Multiple frameworks and methodologies for identifying and mitigating bias exist, which could be applied to these data.48,49 Additionally, like any advanced analytics interventions, computable social factor phenotypes would need to be continually evaluated and monitored for effectiveness and lack of bias.50 Effectively identifying patients’ social risks is necessary for health care organizations to initiate appropriate referrals to services.51 Vendors, collaboratives, and health care organizations have successfully integrated screening questions into EHR systems and workflows to support data collection.41,52,53 Nevertheless, usage of screening questionnaires in practice is highly variable 54 and, when used, they have increased staff’s data collection burden.55 As a measurement strategy reliant on existing data and one that can be potentially automated, computable social factor phenotypes could support the screening use case while avoiding the challenges of administering questionnaires.
However, screening for social risk factors in health care can be a sensitive issue for patients 56,57 and automated computable phenotypes are admittedly not as transparent a screening strategy as patient-completed questionnaires. If computable social factor phenotypes could be successfully developed, future work should include assessments of patient acceptability.
Next Steps Developing computable phenotypes was beyond the scope of this Delphi panel. Nevertheless, the findings in this paper provide a candidate list of data elements that could be further evaluated for constructing computable phenotypes. As part of the identification process, panelists explained or justified their suggested data elements.
Often, these explanations took the form of methods for transforming or combining data elements into novel and potentially more informative measures. As an example, panelists emphasized the potential to gain information by looking at changes in data elements over time.
The most salient examples were changes in addresses to identify housing instability, in insurance status for financial status, or in emergency contact information for social relationships, which are regularly updated for billing and reimbursement reasons. Others have suggested similar uses of change in address data over time.58 Likewise, panel members noted the information to be gained by explainable missingness (ie, instances in which social circumstances would result in data not being recorded or intentionally recorded as missing, as in the case of missing zip codes, phone numbers, or addresses for homeless individuals).
Other important recommendations related to combining data across patients. For example, novel data points could be created by identifying the number of individuals sharing an address to indicate financial strain or by noting reciprocal emergency contacts as indicators of social support.
- Panelists also surfaced the possibility that social need could be identified using evidence of discordant utilization.
- An example of this type included ordering nutritional supplements or specific meals without an accompanying diagnosis that suggested a clinical need.
- Lastly, ideas from panel members also included using the information from the components of the patient record.
This included the presence, absence, length, or accessing of social worker notes or even the frequency with which the patient portal was used. Therefore, in addition to raw, untransformed data elements, future work to develop computable phenotypes should consider these data transformations or combinations—including changes over time, explainable missingness, combining data across patients, discordant utilization, and the components of the patient record.
- Limitations Given the variation in the usage of social risk factors, risks, and needs in practice and the literature, it is possible that panel members had different conceptualizations of the social risk factors discussed.
- For identification, such variation likely had little effect, but the actual movement to phenotype construction would require clear construct definitions.
Similarly, our survey did not reflect the multidimensional nature of data quality (eg, conformance, completeness, plausibility) but relied on a single question to reduce respondent burden. Additional work would be necessary to understand the different data quality ratings of each data element.
For example, we cannot tell from this study if elements were rated as low quality due to perceived inaccuracy or that the data elements could not be relied upon because they were used too infrequently. Additional study would be required to compare panelists’ perceptions with actual data quality metrics.
Also, the panelists were instructed to exclude nonstructured and other sources of data. Of course, these are important sources of social risk factor information, but their potential usage in computable phenotypes represents a different set of challenges than the ones explored in this panel.
Still, the identified structured data elements could be combined with structured data from survey questionnaires or even unstructured data extracted from NLP, where available. Such combinations could be more informative. Although expert panel members recognized that social risk factors change over time, the issue of appropriate intervals for measuring social risk factors was not included in this Delphi panel.
Lastly, our expert panel reflected individuals with knowledge about EHR and HIE data sources and processes that generated data for clinical, research, and business purposes. A different set of panel members, with different backgrounds, may have identified other data elements.
- CONCLUSIONS EHRs and HIE systems contain structured data elements that reflect patient social circumstances, and these data may be useful in developing computable phenotypes.
- Efforts to develop phenotypes should consider data quality and risks for systematic differences across populations.
- Future computable phenotyping research should validate strategies for incorporating concepts such as changes over time, explainable missingness, combining data across patients, discordant utilization, and the components of the patient record.
Acknowledgments The authors thank Lindsey Sanner, MPH, for her assistance with visualizations. Author Affiliations: Indiana University Richard M. Fairbanks School of Public Health (JRV, HEK, WMT), Indianapolis, IN; Regenstrief Institute (JRV), Indianapolis, IN; Department of Medicine (JA-M) and Department of Family and Community Medicine (LMG), University of California, San Francisco, San Francisco, CA; Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida (JB, CM, EAS, CAH), Gainesville, FL; Population Health Sciences, Weill Cornell Medical College (TRC), New York, NY; Mathematica (GRC), Washington, DC; New York eHealth Collaborative (ND), New York, NY; Owl Health Works LLC (JH), Indianapolis, IN; Departments of Family and Community Medicine and Biomedical Informatics, College of Medicine, The Ohio State University (TRH), Columbus, OH; Indiana Health Information Exchange (JPK), Indianapolis, IN; Johns Hopkins School of Public Health (HK), Baltimore, MD; Department of Population Health, Dell Medical School, The University of Texas at Austin (AK), Austin, TX; Anthem, Inc (JMO), Indianapolis, IN; Department of Pediatrics, Center for Health and Community, University of California, San Francisco (MSP), San Francisco, CA; University of Rochester Medical Center (WP), Rochester, NY; Department of Pediatrics, School of Medicine, Indiana University (SW), Indianapolis, IN.
Source of Funding: This work was supported, in part, by the Indiana Clinical and Translational Sciences Institute Fund and in part by award No. UL1TR002529 from the National Institutes of Health, National Center for Advancing Translational Sciences, Clinical and Translational Sciences Award. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Author Disclosures: Dr Vest provided consulting to New York eHealth Collaborative and Pima County; is a founder and equity holder in Uppstroms, LLC, a technology company; and has patents pending with Uppstroms. Dr Wiehe reports receiving an incentive for participating in the panel.
Dr Harle’s institution has received research grants related to social risk factors and their measurement. The remaining authors report no relationship or financial interest with any entity that would pose a conflict of interest with the subject matter of this article. Authorship Information: Concept and design (JRV, CAH); acquisition of data (JRV, JA-M, LMG, JB, TRC, GC, ND, JH, TRH, JPK, HK, AJ, HEK, CM, CM, JMO, MSP, WP, EAS, WMT, SW, CAH); analysis and interpretation of data (JRV, JA-M, LMG, JB, TRC, GC, ND, JH, TRH, JPK, HK, AJ, HEK, CM, CM, JMO, MSP, WP, EAS, WMT, SW, CAH); drafting of the manuscript (JRV, JA-M, LMG, CM, HEK, CAH); critical revision of the manuscript for important intellectual content (JRV, JA-M, LMG, JB, TRC, GC, ND, JH, TRH, JPK, HK, AJ, HEK, CM, CM, JMO, MSP, WP, EAS, WMT, SW, CAH); statistical analysis (JRV, HEK, CAH); provision of patients or study materials (JRV, CM, HEK, CAH); obtaining funding (JRV); administrative, technical, or logistic support (JRV, HEK, CM, CAH); supervision (JRV, CAH).
Address Correspondence to: Joshua R. Vest, PhD, MPH, Indiana University Richard M. Fairbanks School of Public Health, 1050 Wishard Blvd, Indianapolis, IN 46202. Email: [email protected]. REFERENCES 1. Green K, Zook M. When talking about social determinants, precision matters.
- Health Affairs,
- October 29, 2019.
- Accessed December 3, 2019.
- Https://www.healthaffairs.org/do/10.1377/hblog20191025.776011/full/ 2.
- Alderwick H, Gottlieb LM.
- Meanings and misunderstandings: a social determinants of health lexicon for health care systems.
- Milbank Q,2019;97(2):407-419.
- Doi:10.1111/1468-0009.12390 3.
Woolf S, Aron L, eds.U.S. Health in International Perspective: Shorter Lives, Poorer Health, The National Academies Press; 2013.4. Commission on Social Determinants of Health. Closing the Gap in a Generation: Health Equity Through Action on the Social Determinants of Health,
- World Health Organization; 2008.5.
- Pruitt Z, Emechebe N, Quast T, Taylor P, Bryant K.
- Expenditure reductions associated with a social service referral program.
- Popul Health Manag,2018;21(6):469-476.
- Doi:10.1089/pop.2017.0199 6.
- Bardsley M, Billings J, Dixon J, Georghiou T, Lewis GH, Steventon A.
- Predicting who will use intensive social care: case finding tools based on linked health and social care data.
Age Ageing,2011;40(2):265-270. doi:10.1093/ageing/afq181 7. Tan M, Hatef E, Taghipour D, et al. Including social and behavioral determinants in predictive models: trends, challenges, and opportunities. JMIR Med Inform,2020;8(9):e18084. doi:10.2196/18084 8.
- Asthurirathne SN, Vest J, Menachemi N, Halverson PK, Grannis SJ.
- Assessing the capacity of social determinants of health data to augment predictive models identifying patients in need of wraparound social services.
- J Am Med Inform Assoc,2018;25(1):47-53.
- Doi:10.1093/jamia/ocx130 9.
- Institute of Medicine.
Capturing Social and Behavioral Domains in Electronic Health Records: Phase 2. The National Academies Press; 2014.10. Henrikson NB, Blasi PR, Dorsey CN, et al. Psychometric and pragmatic properties of social risk screening tools: a systematic review. Am J Prev Med,2019;57(6 suppl 1):S13-S24.
Doi:10.1016/j.amepre.2019.07.012 11. Matthew J, Hodge C, Khau M. Z codes utilization among Medicare fee-for-service (FFS) beneficiaries in 2017. CMS. January 2020. Accessed January 19, 2021. https://www.cms.gov/files/document/cms-omh-january2020-zcode-data-highlightpdf.pdf 12. Truong HP, Luke AA, Hammond G, Wadhera RK, Reidhead M, Joynt Maddox KE.
Utilization of social determinants of health ICD-10 Z-codes among hospitalized patients in the United States, 2016-2017. Med Care,2020;58(12):1037-1043. doi:10.1097/MLR.0000000000001418 13. Guo Y, Chen Z, Xu K, et al. International Classification of Diseases, Tenth Revision, Clinical Modification social determinants of health codes are poorly used in electronic health records.
Medicine (Baltimore),2020;99(52):e23818. doi:10.1097/MD.0000000000023818 14. Gottlieb LM, Francis DE, Beck AF. Uses and misuses of patient- and neighborhood-level social determinants of health data. Perm J,2018;22:18-078. doi:10.7812/tpp/18-078 15. Buajitti E, Chiodo S, Rosella LC. Agreement between area- and individual-level income measures in a population-based cohort: implications for population health research.
SSM Popul Health,2020;10:100553. doi:10.1016/j.ssmph.2020.100553 16. Chapman WW, Nadkarni PM, Hirschman L, D’Avolio LW, Savova GK, Uzuner O. Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions.
- J Am Med Inform Assoc,2011;18(5):540-543.
- Doi:10.1136/amiajnl-2011-000465 17.
- Lasser EC, Kim JM, Hatef E, Kharrazi H, Marsteller JA, DeCamp LR.
- Social and behavioral variables in the electronic health record: a path forward to increase data quality and utility.
- Acad Med,2021;96(7):1050-1056.
- Doi:10.1097/ACM.0000000000004071 18.
Parikh RB, Jain SH, Navathe AS. The sociobehavioral phenotype: applying a precision medicine framework to social determinants of health. Am J Manag Care,2019;25(9):421-423.19. Frey LJ, Lenert L, Lopez-Campos G. EHR big data deep phenotyping: contribution of the IMIA Genomic Medicine Working Group.
Yearb Med Inform,2014;9(1):206-211. doi:10.15265/iy-2014-0006 20. Verchinina L, Ferguson L, Flynn A, Wichorek M, Markel D. Computable phenotypes: standardized ways to classify people using electronic health record data. Perspect Health Inf Manag,2018;(Fall):1-8.21. Richesson RL, Hammond WE, Nahm M, et al.
Electronic health records based phenotyping in next-generation clinical trials: a perspective from the NIH Health Care Systems Collaboratory. J Am Med Inform Assoc,2013;20(e2):e226-e231. doi:10.1136/amiajnl-2013-001926 22. Hripcsak G, Albers DJ. Next-generation phenotyping of electronic health records.
J Am Med Inform Assoc,2013;20(1):117-121. doi:10.1136/amiajnl-2012-001145 23. Basile AO, Ritchie MD. Informatics and machine learning to define the phenotype. Expert Rev Mol Diagn,2018;18(3):219-226. doi:10.1080/14737159.2018.1439380 24. Feller DJ, Bear Don’t Walk OJ IV, Zucker J, Yin MT, Gordon P, Elhadad N.
Detecting social and behavioral determinants of health with structured and free-text clinical data. Appl Clin Inform,2020;11(1):172-181. doi:10.1055/s-0040-1702214 25. Ahmad FS, Ricket IM, Hammill BG, et al. Computable phenotype implementation for a national, multicenter pragmatic clinical trial: lessons learned from ADAPTABLE.
- Circ Cardiovasc Qual Outcomes,2020;13(6):e006292.
- Doi:10.1161/CIRCOUTCOMES.119.006292 26.
- Richesson R, Smerek M.
- Electronic health records-based phenotyping.
- Rethinking clinical trials.
- June 27, 2014.
- Accessed April 26, 2021.
- Https://sites.duke.edu/rethinkingclinicaltrials/ehr-phenotyping/ 27.
- McPherson S, Reese C, Wendler MC.
Methodology update: Delphi studies. Nurs Res,2018;67(5):404-410. doi:10.1097/nnr.0000000000000297 28. Hasson F, Keeney S, McKenna H. Research guidelines for the Delphi survey technique. J Adv Nurs,2000;32(4):1008-1015. doi:10.1046/j.1365-2648.2000.t01-1-01567.x 29.
Social determinants of health. HealthyPeople.gov. Accessed March 5, 2020. https://www.healthypeople.gov/2020/topics-objectives/topic/social-determinants-of-health 30. National Academies of Sciences, Engineering, and Medicine. Accounting for Social Risk Factors in Medicare Payment: Identifying Social Risk Factors,
The National Academies Press; 2016.31. Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG. Research electronic data capture (REDCap)—a metadata-driven methodology and workflow process for providing translational research informatics support.
J Biomed Inform,2009;42(2):377-381. doi:10.1016/j.jbi.2008.08.010 32. Harris PA, Taylor R, Minor BL, et al; REDCap Consortium. The REDCap consortium: building an international community of software platform partners. J Biomed Inform,2019;95:103208. doi:10.1016/j.jbi.2019.103208 33. Kahn MG, Callahan TJ, Barnard J, et al.
A harmonized data quality assessment terminology and framework for the secondary use of electronic health record data. EGEMS (Wash DC),2016;4(1):1244-1244. doi:10.13063/2327-9214.1244 34. Lee YW, Strong DM, Kahn BK, Wang RY. AIMQ: a methodology for information quality assessment.
- Inf Manage,2002;40(2):133-146.
- Doi:10.1016/s0378-7206(02)00043-5 35.
- Ferryman K, Pitcan M.
- Fairness in precision medicine.
- Data & Society.
- February 26, 2018.
- Accessed March 12, 2020.
- Https://datasociety.net/library/fairness-in-precision-medicine/ 36.
- Russell GJ.
- Itemized rating scales (Likert, semantic differential, and Stapel).
In: Kamakura W, ed. Marketing Research, Wiley & Sons; 2010. Sheth J, Malhotra NK, eds. Wiley International Encyclopedia of Marketing ; vol 2. Accessed March 17, 2021. https://onlinelibrary.wiley.com/doi/abs/10.1002/9781444316568.wiem02011 37. Weiskopf NG, Weng C.
- Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research.
- J Am Med Inform Assoc,2013;20(1):144-151.
- Doi:10.1136/amiajnl-2011-000681 38.
- Callahan A, Shah NH, Chen JH.
- Research and reporting considerations for observational studies using electronic health record data.
Ann Intern Med,2020;172(suppl 11):S79-S84. doi:10.7326/M19-0873 39. Horsky J, Drucker EA, Ramelson HZ. Accuracy and completeness of clinical coding using ICD-10 for ambulatory visits. AMIA Annu Symp Proc,2018;2017:912-920.40. Weeks WB, Cao SY, Lester CM, Weinstein JN, Morden NE.
Use of Z-codes to record social determinants of health among fee-for-service Medicare beneficiaries in 2017. J Gen Intern Med,2020;35(3):952-955. doi:10.1007/s11606-019-05199-w 41. Weir RC, Proser M, Jester M, Li V, Hood-Ronick CM, Gurewich D. Collecting social determinants of health data in the clinical setting: findings from national PRAPARE implementation.
J Health Care Poor Underserved,2020;31(2):1018-1035. doi:10.1353/hpu.2020.0075 42. Rutjes AWS, Reitsma JB, Di Nisio M, Smidt N, van Rijn JC, Bossuyt PMM. Evidence of bias and variation in diagnostic accuracy studies. CMAJ,2006;174(4):469-476. doi:10.1503/cmaj.050090 43.
- Rajkomar A, Hardt M, Howell MD, Corrado G, Chin MH.
- Ensuring fairness in machine learning to advance health equity.
- Ann Intern Med,2018;169(12):866-872.
- Doi:10.7326/M18-1990 44.
- Obermeyer Z, Powers B, Vogeli C, Mullainathan S.
- Dissecting racial bias in an algorithm used to manage the health of populations.
Science,2019;366(6464):447-453. doi:10.1126/science.aax2342 45. Adamson AS, Smith A. Machine learning and health care disparities in dermatology. JAMA Dermatol,2018;154(11):1247-1248. doi:10.1001/jamadermatol.2018.2348 46. Cohen GR, Friedman CP, Ryan AM, Richardson CR, Adler-Milstein J.
Variation in physicians’ electronic health record documentation and potential patient harm from that variation. J Gen Intern Med,2019;34(11):2355-2367. doi:10.1007/s11606-019-05025-3 47. Overhage JM, McCallie D. Physician time spent using the electronic health record during outpatient encounters. Ann Intern Med,2020;173(7):594-595.
doi:10.7326/M18-3684 48. Bellamy RKE, Dey K, Hind M, et al. AI Fairness 360: an extensible toolkit for detecting, understanding, and mitigating unwanted algorithmic bias. ArXiv, Preprint posted online October 3, 2018. Accessed May 29, 2020. http://arxiv.org/abs/1810.01943 49.
Haneuse S, Daniels M. A general framework for considering selection bias in EHR-based studies: what data are observed and why? EGEMS (Wash DC),2016;4(1):1203. doi:10.13063/2327-9214.1203 50. Embi PJ. Algorithmovigilance—advancing methods to analyze and monitor artificial intelligence–driven health care for effectiveness and equity.
JAMA Netw Open,2021;4(4):e214622. doi:10.1001/jamanetworkopen.2021.4622 51. Garg A, Boynton-Jarrett R, Dworkin PH. Avoiding the unintended consequences of screening for social determinants of health. JAMA,2016;316(8):813-814. doi:10.1001/jama.2016.9282 52.
- Buitron de la Vega P, Losi S, Sprague Martinez L, et al.
- Implementing an EHR-based screening and referral system to address social determinants of health in primary care.
- Med Care,2019;57(6 suppl 2):S133-S139.
- Doi:10.1097/mlr.0000000000001029 53.
- Gold R, Bunce A, Cowburn S, et al.
- Adoption of social determinants of health EHR tools by community health centers.
Ann Fam Med,2018;16(5):399-407. doi:10.1370/afm.2275 54. Cottrell EK, Dambrun K, Cowburn S, et al. Variation in electronic health record documentation of social determinants of health across a national network of community health centers. Am J Prev Med,2019;57(6 suppl 1):S65-S73.
- Doi:10.1016/j.amepre.2019.07.014 55.
- Greenwood-Ericksen M, DeJonckheere M, Syed F, Choudhury N, Cohen AJ, Tipirneni R.
- Implementation of health-related social needs screening at Michigan health centers: a qualitative study.
- Ann Fam Med,2021;19(4):310-317.
- Doi:10.1370/afm.2690 56.
- Usnoor SV, Koonce TY, Hurley ST, et al.
Collection of social determinants of health in the community clinic setting: a cross-sectional study. BMC Public Health,2018;18(1):550. doi:10.1186/s12889-018-5453-2 57. Pinto AD, Glattstein-Young G, Mohamed A, Bloch G, Leung FH, Glazier RH. Building a foundation to reduce health inequities: routine collection of sociodemographic data in primary care.
What type of data is in a patient chart?
Medical charts provide healthcare providers a glimpse into a patient’s medical history and provide vital details to help clinicians make sound care decisions. A medical chart is a thorough record of a patient’s medical history and clinical data. Information such as demographics, vital signs, diagnoses, surgeries, medications, treatment plans, allergies, laboratory results, radiological studies, immunization records is included.
What is data classification for healthcare?
Health data classification (also known as medical coding or medical classification) is the technique of changing the nomenclature of medical diagnoses and procedures into a code number system that is used universally.
What are examples of a set of data?
A data set is a collection of numbers or values that relate to a particular subject. For example, the test scores of each student in a particular class is a data set. The number of fish eaten by each dolphin at an aquarium is a data set.
What are 3 data sets?
and their general characteristics – Video version of the story, if you are into that sort of thing In one of my previous posts, I talked about what Data is and what does Data Attributes mean. This will continue on that, if you haven’t read it, read it here in order to have a proper grasp of the topics and concepts I am going to talk about in the article.
- Please bear with me for the conceptual part, I know it can be a bit boring but if you have strong fundamentals, then nothing can stop you from being a great Data Scientist or Machine Learning Engineer,
- T here are three general characteristics of Data Sets namely: Dimensionality, Sparsity, and Resolution.
We shall discuss what do they exactly mean one at a time. What is Dimensionality? → The dimensionality of a data set is the number of attributes that the objects in the data set have. In a particular data set if there are high number of attributes (also called high dimensionality), then it can become difficult to analyse such a data set.
- When this problem is faced, it is referred to as Curse of Dimensionality.
- In order to understand what the hell is this Curse of Dimensionality, we first need to understand the other two characteristics of Data.
- What is Sparsity? → For some data sets, such as those with asymmetric features, most attributes of an object have values of 0; in many cases fewer than 1% of the entries are non-zero.
Such a data is called sparse data or it can be said that the data set has Sparsity. What is Resolution ? → The patterns in the data depend on the level of resolution. If the resolution is too fine, a pattern may not be visible or may be buried in noise; if the resolution is too coarse, the pattern may disappear.
For example, variations in atmospheric pressure on a scale of hours reflect the movement of storms and other weather systems. On a scale of months, such phenomena are not detectable. Now, coming back to the Curse of Dimensionality, it means many types of Data Analysis becomes difficult as the dimensionality (number of attributes in the data set) of the data set increases.
Specifically, as dimensionality increases, the data becomes increasingly sparse in the space that it occupies. For classification, this can mean that there are not enough data objects to allow the creation of a model that reliably assigns a class to all possible objects.
For clustering, the definitions of density and the distance between points, which are critical for clustering, become less meaningful. F inally, coming on the types of Data Sets, we define them into three categories namely, Record Data, Graph-based Data, and Ordered Data. Let’s have a look at them one at a time.
Record Data Introduction to Data Mining — Pang-Ning Tan, Michael Steinbach, Vipin Kumar → Majority of Data Mining work assumes that data is a collection of records (data objects). → The most basic form of record data has no explicit relationship among records or data fields, and every record (object) has the same set of attributes.
Transaction or Market Basket Data: It is a special type of record data, in which each record contains a set of items. For example, shopping in a supermarket or a grocery store. For any particular customer, a record will contain a set of items purchased by the customer in that respective visit to the supermarket or the grocery store. This type of data is called Market Basket Data. Transaction data is a collection of sets of items, but it can be viewed as a set of records whose fields are asymmetric attributes. Most often, the attributes are binary, indicating whether or not an item was purchased or not. The Data Matrix: If the data objects in a collection of data all have the same fixed set of numeric attributes, then the data objects can be thought of as points (vectors)in a multidimensional space, where each dimension represents a distinct attribute describing the object. A set of such data objects can be interpreted as an m X n matrix, where there are n rows, one for each object, and n columns, one for each attribute. Standard matrix operation can be applied to transform and manipulate the data. Therefore, the data matrix is the standard data format for most statistical data. The Sparse Data Matrix: A sparse data matrix (sometimes also called document-data matrix )is a special case of a data matrix in which the attributes are of the same type and are asymmetric; i.e., only non-zero values are important.
Graph-based Data Introduction to Data Mining — Pang-Ning Tan, Michael Steinbach, Vipin Kumar This can be further divided into types:
Data with Relationships among Objects: The data objects are mapped to nodes of the graph, while the relationships among objects are captured by the links between objects and link properties, such as direction and weight. Consider Web pages on the World Wide Web, which contain both text and links to other pages. In order to process search queries, Web search engines collect and process Web pages to extract their contents. Data with Objects That Are Graphs: If objects have structure, that is, the objects contain sub objects that have relationships, then such objects are frequently represented as graphs. For example, the structure of chemical compounds can be represented by a graph, where the nodes are atoms and the links between nodes are chemical bonds.
Ordered Data Introduction to Data Mining — Pang-Ning Tan, Michael Steinbach, Vipin Kumar For some types of data, the attributes have relationships that involve order in time or space. As you can see in the picture above, it can be segregated into four types:
Sequential Data: Also referred to as temporal data, can be thought of as an extension of record data, where each record has a time associated with it. Consider a retail transaction data set that also stores the time at which the transaction took place Sequence Data: Sequence data consists of a data set that is a sequence of individual entities, such as a sequence of words or letters. It is quite similar to sequential data, except that there are no time stamps; instead, there are positions in an ordered sequence. For example, the genetic information of plants and animals can be represented in the form of sequences of nucleotides that are known as genes. Time Series Data: Time series data is a special type of sequential data in which each record is a time series, i.e., a series of measurements taken over time. For example, a financial data set might contain objects that are time series of the daily prices of various stocks. Spatial Data: Some objects have spatial attributes, such as positions or areas, as well as other types of attributes. An example of spatial data is weather data (precipitation, temperature, pressure) that is collected for a variety of geographical locations.
This concludes this post on types of Data Sets. The follow up to this post is here,
What are key data elements?
This week, I’ll like to borrow my mantra from George Orwell’s Animal Farm. “All animals are equal, but some animals are more equal than others”. I believe the same statement is true for every organization’s data asset. Every Organization’s Data is an asset at your disposal to be exploited for optimal competitive edge and realization of unparalleled value.
But, some Data are more important than others. I believe this mindset has to take center stage as you begin to strategize and plan for your Data Governance adoption journey. While the current appetite for governed trusted data in the current global economy continues to grow like no other time, it is important to be realistic and know how to better scope out your delivery.
The reality is that you cannot truly govern all your organization’s data in one go. You simply cannot boil the ocean and govern the universe of your data. You will have to strategize and call-out what should swim to the top of your priority in terms of governance rollout and execution.
- This is where you’ll need to shift your mindset to start putting on the lens of identifying and weighing the true importance of each data in your ecosystem.
- This is where you simply have to identify the big fish in your pond for the right treatment of care and order of execution.
- Every organization must define key drivers for Data Governance adoption to identify its key data element using defined criteria or questionnaire.
The driving criteria and questionnaire must be reviewed and agreed by all stakeholders and use as a guide to weigh the importance of each of the enterprise data elements. What is a Key Data Element? Key Data Element (KDE) also referred to as Critical Data Element(CDE) can be defined as elements with material impact on your organization’s business operations, decisions, and other data demands i.e regulatory, compliance, and market demands.
- Why should you identify and prioritize your Key Data Elements? We can argue the fact that all your enterprise data are key to the survival and operation of your business.
- Hence, all data needs to be governed at some level of data maturity to realize value out of it.
- However, there are some data your organization heavily depends on for its survival, market growth, and credibility.
Your plight in your KDE prioritization effort is to be able to call out those data with the highest premium for needed ultra-care in a timely manner. These Key Data Elements or Critical Data Elements will need to be called out in recognition of the fact that not all data elements are of equal importance to your business operations.
Identification of those key data elements will allow you to enact more effective controls and business rules around them; to measure and monitor their quality based on organizational usage. The following highlight some of the reasons it’s important to identify your Key Data elements and rightfully prioritize them for needed preventive standards of care through governance: · It allows you to be focused and lean in setting reasonable expectations for your Data Governance adoption and execution.
It further buttresses the fact and sends a clear message that Data governance is a journey of cultural transformation and not a project. · It provides a rich opportunity to concentrate your resource effort to your highest worth data asset first before addressing the quality needs of other data.
- · It positions you for quick wins by simply devoting your time to areas of highest quality pain with the biggest impact.
- · It accelerates the credibility and acceptance of your Data Governance adoption and organizational buy-in.
- · It makes for an easier case for continuous funding and easier translation of ROI.
· It help sets the tone for tracking your Data Governance overall adoption success. · It will help drive the level of priority for establishing data standards, developing data quality rules, defining thresholds for defects, and issues remediation. What are the factors you must consider before identifying your key data elements? There are several factors you should consider as you embark on your KDE identification process.
Your KDE identification process should be driven by your organization’s data demand and strategic visions. You’ll need to understand your organization’s appetite to fulfill its competing demands and how you plan to exploit your data asset to drive those aspirational goals. These drivers should already be part of your governance framework.
Your scope and driver for data governance should drive your KDE identification process. Understand the intent of your KDE identification is to better use this as a tool to drive your governance execution priority and set the right expectation for the scope and order of your delivery.
- Understand this identification process will also be used to manage your data quality issues remediation process.
- Basically, you’re using this to drive the order of improving the health of your data and resolving ongoing anomalies hurting your business operations.
- Your KDE prioritization effort must be facilitated by the Data Governance team in a joint working, collaborative effort between your subject matter experts across all your Line of business and support functions.
Each enterprise Data community must be represented in the conversation. To this effect, you need to develop a solid methodology in the form of a guiding questionnaire to help aid your assessment of each data and rightfully rank. Your questionnaire should contain needful questions based on your organization’s strategic vision, your competing data demands, and your industry climate.
Here’s what a typical KDE identification in a highly regulated financial sector with competing data demands might look like: 1) Is this element currently highlighted as part of a regulatory data element flagged as Matter Requiring Attention(MRA) or Matters Requiring Immediate Attention(MRIA) by your regulator, compliance body, or auditors? 2) Does this element have a material impact on your Business? 3) Do you use this element to derive or calculate another element already flagged as KDE? 4) Does this data have an impact on other elements? 5) Do you use this element on multiple internal/external reports or another key decision making? 6) Will the quality of this data element impact your customer loyalty? 7) Will the quality of this data impact your model and analytics calculation?,
It is highly recommended that you perform a periodic review of your KDE Prioritization methodology to validate that the information about the KDE definition is still valid, complete, and accurate. In a nutshell, this questionnaire will help rightfully classify data set to the right level for prioritization and application of needed governance standards of care as you embark on your Data Governance adoption journey.
This will help set you on the right path for a successful adoption as you focus your governance activation activities on the right dataset with the highest sensitivity and urgency to help your organization realize its governance goals. For more detail and practical help on best-fit methodology to prioritize your data for effective Governance adoption.
Book a Free Call with me to discuss your challenges and we can explore simple strategies to actualize your governance success. https://calendly.com/lara-gureje/30min
What is a set of data elements?
A Data Set represents an instance of a real world Information Object. A Data Set is constructed of Data Elements. Data Elements contain the encoded Values of Attributes of that object. The specific content and semantics of these Attributes are specified in Information Object Definitions (see PS3.3 ).
What are the 3 parts of data element?
A data element has: An identification such as a data element name. A clear data element definition. One or more representation terms.
How do you identify data elements?
Definition of data element The fundamental data structure in a data processing system. Any unit of data defined for processing is a data element; for example, ACCOUNT NUMBER, NAME, ADDRESS and CITY. A data element is defined by size (in characters) and type (alphanumeric, numeric only, true/false, date, etc.).
A specific set of values or range of values may also be part of the definition. Technically, a data element is a logical definition of data, whereas a field is the physical unit of storage in a record. For example, the data element ACCOUNT NUMBER, which exists only once, is stored in the ACCOUNT NUMBER field in each customer record and in the ACCOUNT NUMBER field in each order record.
See, The Basic Unit of Storage Technically, data elements describe the logical unit of data, fields are the actual storage units, and data items are the individual instances of the data elements as in this example. In practice, all three terms may be used interchangeably. However, technical documentation on database management should employ the terms properly. : Definition of data element
What is a core data element?
(2) Core data elements,— The term “core data elements” means data elements relating to financial management, administration, or management that— (A) are not program-specific in nature or program-specific outcome measures, as defined in section 1115(h) of this title ; and (B) are required by agencies for all or the vast majority of recipients of Federal awards for purposes of reporting.
What is the difference between data element and data column?
A data element specifies the type of data a column contains, which in turn determines the transforms that can be applied in a Transformer stage. The use of data elements is optional. You do not have to assign a data element to a column, but it enables you to apply stricter data typing in the design of server jobs.
What is the purpose of data elements?
A data element specifies the type of data a column contains, which in turn determines the transforms that can be applied in a Transformer stage. The use of data elements is optional. You do not have to assign a data element to a column, but it enables you to apply stricter data typing in the design of server jobs.
What is the use of data elements?
Techopedia Explains Data Element – There are two types of data elements:
Elementary Data Elements: Defined by the built-in values of data type and length Reference Data Elements: Make use of reference variables mostly used in other ABAP objects
Data elements are used to define the characteristics of a table field or a component of a structure. They are also used to define the row type of the table type. The meaning of the table field or structure component along with editing screen fields can be mapped to a data element.
All of this information is available automatically to screen fields that are referenced to a data element. ABAP programs can make use of data elements by referencing the keyword TYPE. In this way, variables used in an ABAP program can take the characteristics or attributes of the referenced data elements.
It is always advised to make use of built SAP data elements before creating new ones.
What are data elements and why are they important?
Definition of key data terms used in this article – Before we continue, let’s go through a few terminologies related to data used in this article (Mahanti, 2021),
Data entities are the real-world objects, concepts, events, and phenomena about which we collect data. For example, customer is a one of the most common entities. Data elements are the different attributes that describe the data entity. For example, data elements of the customer entity might be a unique id to identify the customer, customer name, date of birth and status. Data quality is the capability of data to satisfy the stated business, system, and technical requirements of an enterprise. Data quality is an an evaluation of data’s fitness to serve their purpose in a given context. Data are considered of high quality if they are fit for their intended use ( Mahanti 2019 ). Data quality dimensions are characteristics that would define the quality of a data. Data quality dimensions provide a means to quantify and manage the quality of data (Mahanti 2019). Referring to the “customer data entity” in our example, this would relate to the presence of useful values for each of the data elements in each record of the customer data entity, such as timely availability of the data, and accuracy and currency of the data. Data governance is the exercise and enforcement of policies, processes, guidelines, rules, standards, metrics, controls, decision rights, roles, responsibilities, and accountabilities to manage data as a strategic enterprise asset ( Mahanti 2021a ).
Why is it important to define the data elements in a data dictionary in healthcare?
Interpreting Data Element Definitions and Allowable Values – Every attempt has been made to comprehensively define The Joint Commission’s National Quality Measure data elements and allowable values in a manner that obviates the need for interpretation.