Characterizing Today's Health Information Technology Toolbox Through a Comprehensive Environmental Scan
Event Type
Poster Presentation
TimeThursday, April 152:09pm - 2:10pm EDT
LocationDigital Health
Advances in electronic health records (EHRs) and data science have resulted in a rapid growth of healthcare data. However, these advances are limited in utility if we do not have methods to also analyze at the volume, velocity, and veracity with which big healthcare data is being generated.
This scoping review identified open-source, open standards health information technology (health IT) tools used for research. The research questions guiding our review were: Primary research question: What open-source tools are used by researchers throughout the cycle of working with big data, including data capture (i.e., data acquisition and entry), data maintenance (i.e., data movement, integration, cleansing, and enrichment), data analysis, data usage, data publication, and data archival? Secondary research questions: What are the key technical, functionality, and usability characteristics of the open-source tools?; What are critical needs in the development of open-source health IT tools for the future?; How do healthcare researchers currently use Fast Healthcare Interoperability Resources (FHIR) during the data cycle?

The objective was to explore, identify, and describe open health IT tools for research through a comprehensive landscape analysis to identify available tools in the scientific literature and available public databases. Scoping reviews contextualize knowledge by systematically mapping literature on a topic, identifying key concepts and sources of evidence, and identifying gaps in current research. Analyzing a wide range of research and non-research material allowed for a comprehensive scan. We used a modified version of the York framework for scoping reviews but modified for tool identification and evaluation to contextualize knowledge by systematically mapping literature on a topic, identifying concepts and sources of evidence, and identifying gaps in current research. A comprehensive evaluation of open-source health IT-based tools with considerations of potential enhancements or expansion for broader support and uptake can be used to support technical development of the data factory platforms (FHIR Factories).

Using a modified York Framework, we conducted a comprehensive search that included consultation with technical subject matter experts (SMEs) with expertise in big data, consultation with librarians to search the peer-reviewed and gray literature, and using our knowledge about existing open-source tools to identify project repositories, literature, and the web for open health IT tools, creating a repository of solutions and other, similar tools. To understand tool features and characteristics, the project team extracted and charted information including: licensing, documentation quality, community, longevity/pedigree, functionality, security, and support.

We identified 3,343 and screened items (2,707 articles and 636 tools). 121 tools met inclusion criteria and were critically reviewed and charted.
-Licensing: 38 (31.4%) tools had an Apache 2.0 license, 13 tools each (10.7%) had a GNU General Public License version 3.0 or 2.0, 10 tools each (8.26%) had an MIT license or a 3-clause BSD license, 9 tools (7.4%) had a license we have classified as ‘other’ as we were unable to determine the open-source license type from the hosting site.
-Languages: JavaScript and HyperText Markup Language (HTML) were the most popular languages used by at least 41 tools (12.3%), closely followed by Python (38 tools, 11.4%), Java (30 tools, 9%), and Shell (30 tools, 9%).
-Tool type: 82 tools (67.77%) were classified as standalone tools, 27 (22.31%) were classified as libraries or packages, 2 (1.65%) were classified as notebooks, 1 (0.83%) was a framework, and 9 (7.44%) items could not be classified.
-Documentation quality: User documentation was present in 98 (80.99%) tools, Software overview was present in 103 (85.12%) tools, Installation instructions were present in 87 (71.9%) tools, and reference guides for functions were present in 72 (59.5%) tools.
-Community support: Tools had between 0 – 95500 stars on GitHub and had between 0 – 3100 contributors. Time until an issue had a reply and was closed ranged from 0 – 107 days. Total number of open issues ranged from 0 – 2298 and the total number of closed issues ranged from 0 – 31028.
-Longevity/pedigree: Tools ranged in age from 1 – 20 years, had between 0 - 23219 forks on GitHub, and between 0 - 1158 scientific citations. Tools had between 0 - 9 major releases, 0 – 35 minor releases, and between 0 – 26 patches.
-Interoperability: Of the 121 tools, only 15 identified interoperability vocabulary/terminology standards, 7 identified interoperability content/structure standards, and 6 identified interoperability transport (service/exchange) standards in their documentation.
-Data lifecycle: Out of the 121 tools, 66 tools (54.54%) involved data analysis, 37 tools (30.58%) involved maintenance, 26 each (21.49%) involved capture and publication, 11 each (9.09%) involved usage or did not identify a data lifecycle stage, 3 (2.48%) involved archival, and 1 (0.83%) involved production. We also analyzed how many tools involved more than one stage. Fifteen tools included -----Analysis + Publication (12.2%), 14 tools (11.4%) involved Capture + Maintenance, 7 tools (5.7%) involved Maintenance + Analysis, 5 tools (4%) involved Publication + Maintenance + Analysis, 3 tools (2.4%) involved Capture + Maintenance + Analysis, 3 tools (2.4%) involved Capture + Analysis, 2 tools (1.6%) involved Usage + Analysis, 1 tool (0.8%) involved Usage + Maintenance, and 1 tool (0.8%) involved Usage + Capture.
-Support: Paid support was present in 16 (13.22%) tools.

Findings from our comprehensive evaluation of open-source health IT-based tools can be used to support technical development of future optimized Health IT tools to facilitate healthcare delivery research. These findings and recommendations can be used by developers to enhance the data infrastructure design in an iterative refinement process. By first identifying challenges in a systematic way through a landscape analysis, our project team aims to develop a novel health IT solution to enable researchers to pursue more complex questions with faster, more reliable discoveries.