Your browser is unsupported

We recommend using the latest version of IE11, Edge, Chrome, Firefox or Safari.

Blog: CAPriCORN – A resource for health researchers to request deidentified EHR data

Chicago Area Patient-Centered Outcomes Research Network

What is CAPriCORN? Heading link

Lurie Children's Hospital, AllianceChicago, CCHHS, Northwestern, Univeristy of Chicago Medicine, University of Illinois Hospital, NorthShore, Loyola Medicine, Rush University Medical Center, all connected to CAPriCORN in the center

Are you a health researcher looking to extend your research beyond UIC and into other Chicago-based healthcare institutions? Then the data provided by CAPriCORN, which provides the opportunity to acquire a dataset from multiple sites in Chicago, may be the tool you’re looking for. This blog post looks at what is CAPriCORN and how you might use it in your research. CAPriCORN can be thought of as a source of de-identified electronic heath record data from Chicago-area healthcare providers. While the providers who participate in CAPriCORN change from time to time, the current participating intuitions are shown in the image on the right:

How much data is in CAPriCORN? Heading link

In short: a lot. CAPriCORN contains:

• 9.5 million patients
• Data as far back as 2011 through the present, with quarterly refreshes
• 74,000+ healthcare providers
• 14 affiliated hospitals
• 300+ primary care practices
• 21 Federally Qualified Health Centers (FQHCs)
• 11 safety net facilities
• 23 hospice centers
• 8 specialty children’s facilities

What are the benefits to with working with CAPriCORN data? Heading link

Average Turnaround Time for CHAIRb Review: Convened Reviews (PHI/Consent): 21 days; Expedited reviews (Limited PHI/Consent Determined by IRB): 9 days; Non-human subject determinations (No PHI/No Consent Necessary): 0.75 days

One key to CAPriCORN is that it can return de-identified electronic health record data to a researcher.

In this way, using CAPriCORN data dramatically lowers three often-cited barriers to EHR research:

  1. Using CAPriCORN data makes it straightforward to acquire data sharing agreements with a non-UIC health care system (or many of them!)
  2. CAPriCORN has a system for identifying that a single person received care in multiple institutions, while preserving the privacy of that person (and allowing the data to continue to be “de-identified” for IRB purposes)
  3. CAPriCORN has its own IRB, to avoid navigating IRB approval for human subjects research across multiple institutions

The central IRB for CAPriCORN-supported studies is the Chicago Area Institutional Review Board (CHAIRb). For research that uses de-identified datasets (some examples below), IRB approval can occur in less than a day.

How would a researcher use CAPriCORN? Heading link

While CAPriCORN can help the researcher develop their query, the researcher will need to be able to clearly describe:

  • population
  • exposure
  • outcome
  • other variables of interest (e.g.: potential confounders)

To begin their research, a researcher will need to know how to identify each of these in electronic health records. For example, the researcher will need to know the procedure or diagnosis codes that would identify the patients of interest.

There are multiple ways that CAPriCORN can be integrated into research. The following examples highlight three of the most common endpoints that CAPriCORN can provide (in increasing order of complexity and cost): aggregate counts,  patient-level but de-identified data, and limited data sets.

Research Example 1: Aggregate data Heading link

Requestor emails Front Door which connects with Capricorn Central wich sends a query. Encrypted response sent to Capricorn Central sent to Front Door sent to Requestor.

How many women aged 40-50 of each race/ethnicity received mammograms at CAPriCORN institutions in 2020?

This type of query would be classified as a “prep to research” query, and likely would receive a non-human subjects determination by CHAIR-b. Given its level of simplicity, it would be executed at no cost.

If a researcher were interested in this kind of question, they would submit a “front door” request outlining the following information clearly:

  • population (women aged 40-50)
  • outcome (mammograms)
  • other variables of interest (race/ethnicity)

CAPriCORN will send the request to each institution, who will return to CAPriCORN the counts in their EHR system. At CAPriCORN, these will be further aggregated into total counts, and returned to the researcher.

Research Example 2: Patient-level but still de-identified Heading link

Of women who received a screening mammogram in 2019, how many were diagnosed with breast cancer? Of those who were diagnosed with breast cancer, how many had surgery (and was that surgery a mastectomy or a lumpectomy)? Were women who lived in Cook County more likely to receive a mastectomy, and does controlling for age change this relationship?

A query of this type is more complex, and would require some funding (~$10-$15,000 for UIC researchers). However, this kind of research question would be likely to receive a non-human subjects research determination from CHAIR-b, as it can still be answered using de-identified data.

As with the aggregate data example, the researcher would need to be able to identify in EHR data the key aspects of their research question:

  • population (women who received a mammogram in 2019)
  • outcome (diagnosis of breast cancer within a year)
  • outcome (mastectomy or lumpectomy within six months)
  • exposure (Cook County resident)
  • other variable of interest (age)

Once these elements were identified, the query would be sent by CAPriCORN to each of the participating healthcare institutions. Unlike the aggregate data, in this case, each institution would return to CAPriCORN one line of data for each patient who had a mammogram in 2019. These would be combined by CAPriCORN, and after ensuring that the risk of reidentification is low, the researcher would receive patient level data similar to this schematic table:

Patient data Heading link

de-identified patient ID age at mammogram Cook County resident dx of bc within year of mammogram mastectomy within 6 months of dx lumpectomy within 6 months of dx
1 54 1 0 - -
2 60 0 1 1 0
3 74 1 0 - -

Research Example 3: Limited data sets Heading link

A unique aspect of CAPriCORN is that it also contains a mechanism that would allow a researcher to access not just de-identified datasets, but also limited data sets that involve data from CAPriCORN-participating institutions. This includes more specific data on geography (such as zip code of residence), and more specificity about dates. This would allow the researcher to develop a longitudinal cohort out of the administrative data, to examine more complicated outcomes, such as hospital readmissions. It’s also possible, using the privacy-preserving record linkage, to merge outside data with CAPriCORN. One example is a recent project on homelessness.

Does receiving housing services reduce morbidity in people experiencing homelessness?

In this project, external data about housing services was merged with EHR data from six CAPriCORN institutions through an “honest broker” system. With this system, the data that were returned to the researchers were stripped of identifiers, preserving the privacy of the individuals who were experiencing homelessness. Using this data, the researchers were able to identify increases in behavioral health morbidity associated with lower housing support services, and also quantify how unstable housing was associated with additional use of emergency room visits.

What should I do if I’m interested in using CAPriCORN data? Heading link

If you’re interested in CAPriCORN, you can go directly to the “front door” request page. If you have questions, please contact Dr. Howard Gordon (, and the UIC-based CAPriCORN investigators can help to get you started.