|
De-ID Overview
The problem to solve
The Health Insurance Portability and Accountability Act of 1996, or as it is
known today as HIPAA, has required that the use of protected health information
(PHI) in research studies is not permitted except with the explicit consent of the
patient. However, HIPAA does allow for the creation of de-identified health
information. In order for clinical researchers to use clinical data in a way that
complies with HIPAA, it is necessary to de-identify the records.
De-ID process
The extraction of PHI and de-identification requires a defined and structured process.
The Clinical Research Informatics Service
[CRIS]
at the University of Pittsburgh oversees the data extraction and de-identification process
involving databases at the UPMC. CRIS serves as the IRB designated honest-broker" for
approved studies including those involving multiple data sources. This enables the
researcher to have CRIS coordinate data collection and linkage files from the multiple
sources and not risk re-identification by one of the data sources.
Use of the De-ID program is limited to only IRB approved projects. CRIS works closely
with the clinical researcher prior to IRB submission to ensure that the de-identified
information will be able to be used in a particular study. A copy of the IRB approval
letter is required by CRIS.
De-ID mechanics
De-ID uses a set of heuristics to identify the presence of any of the HIPAA 18
identifiers within the text. Supplemental dictionaries of geographic locations, hospital
names, popular names found in the U.S. Census are also used to locate identifiable text.
The UMLS Methatheasurus is utilized to ensure that words or phrases that may be medical
terms with proper names are preserved.
De-ID replaces the identifiable text with specific tags. Names found multiple times in
the report are consistently replaced with the same tag to improve readability of the
report. The downside of applying De-ID is the removal of a small amount of clinical
information during the de-identification process. In our work to date, we have found
only minor problems with this.
| The 18 HIPAA identifiers [CPR 164.514(2)(i)] |
|
| 1. |
Names of individuals, relatives, employers or household members |
| 2. |
Geographical subdivisions smaller than a state except the first 3 digits of
zip codes; however if the region contains less than 20,000 people, the entire
zip code must be replaced.
|
| 3. |
All elements of dates (except year) directly relating to an individual; all
ages over 89 years must be grouped into a single category of 90 or older |
| 4. |
Telephone numbers |
| 5. |
Fax numbers |
| 6. |
Electronic mail addresses |
| 7. |
Social security numbers |
| 8. |
Medical record numbers |
| 9. |
Health plan beneficiary numbers |
| 10. |
Other account numbers |
| 11. |
License numbers |
| 12. |
Vehicle identifiers |
| 13. |
Device identifiers and serial numbers |
| 14. |
Web universal resource locators (URLs) |
| 15. |
Internet protocol (IP) addresses |
| 16. |
Biometric identifiers This item does not occur
in free-text clinical reports
|
| 17. |
Full face photogrphic images or any othrer comparable image
This item does not occur in free-text clinical reports
|
| 18. |
Any other unique identifying number, characteristic or code |
|