Optimizing public health data collection from Internet of Things sensors: An integrated data-sharing platform
Event Type
Poster Presentation
TimeThursday, April 152:30pm - 2:31pm EDT
LocationDigital Health
DescriptionBackground: Internet of Things (IoT) sensors are an essential source of data for data-driven healthcare, which significantly expands the public health data volume; however, there has been no standardized and optimized way for researchers to easily collect such data. In previous health informatics research, researchers usually collect data by asking IoT device data donors (e.g. smart thermostat users) to directly send their data. Donors have the burden to go through multiple steps, such as logging into their accounts, finding the function of downloading historical data from device manufacturers’ website, downloading it, and uploading it to researchers’ website. This process is time-consuming and difficult for some data donors to follow (e.g. too much work), which may reduce their willingness to provide their data for researchers.

To address this issue, some smart home IoT device manufacturers have created specific channels for research data collection. For example, the Ecobee has created a program called “Donate Your Data (DYD)”. It is integrated within users’ online account for voluntary participation. If users agree and give their consent, the data collected from their IoT devices can be shared anonymously for research purpose. This program notably reduces the time and steps needed for donors to share their data and considerably reduces workload. However, the data available from this program are not in real-time but only periodically updated in batches. There are still barriers for real-time health monitoring.

Some IoT manufacturers make their user and device data available through Application Programming Interfaces (APIs). An API allows a third-party program to access the database using software scripts. It can serve as a gateway for researchers and developers to access, retrieve, and store both historical and real-time data collected from IoT sensors, implementing automatic data processing and transmission. Therefore, programs utilizing APIs present significant opportunities as an efficient research tool for health informatic studies that need large-scale IoT sensor data.

Considering the need for standardized and optimized IoT data collection methods, we aim to develop an integrated data-sharing platform that automatically retrieves real-time health-related data from IoT sensors. With the application of API technology at the back-end, this platform is expected to minimize data donors’ workload and save time during the data collection phase. We compare this platform with existing data collection methods to clarify this platform’s potential advantages.

Objective: The primary goal of this study is to design an API-integrated platform for healthcare data collection from IoT sensors. Our secondary goal is to compare three data collection methods which are: (1) nonstandard ways that need data donors’ manual operations; (2) data sharing programs such as DYD; (3) the platform integrated with API technology.

Methods Comparison: Different data collection methods can be compared along the following scales including data completeness, data delay, data privacy, usability on the data donor side, and usability on the researcher side.

Existing method 1: This method relies on data donors to actively record or download their data, and then upload it to researchers. Thus, the data completeness will be remarkably reduced due to highly possible operation errors during the complicated steps. Since this sharing process may occur once a day, it cannot provide real-time data which will cause data delay. In addition, if the channel is not anonymous for donors to send the data, their personally identifiable information (e.g. name) can be leaked, resulting in the issue of data privacy. From data donors’ perspective, they may find difficulty to follow these steps of the method and their response rate will decline if it is a regular task. Also, data is acquired from different donors and the sharing process is not standard, which may cost researchers much time to integrate and pre-process the data.

Existing method 2: The research data program provided by manufacturers such as DYD is not commonly seen. Although they hold all users’ data, which can guarantee the data completeness, the data only be shared on a fixed interval time, such as once a year. This still can cause data delay and set barriers for real-time data availability. However, the data from this program is anonymized and coded before researchers acquire access, which can guarantee data privacy. The manufacturers have simplified the operations for data donors which only need them to consent to join the program, and then their data can be shared with partners of the program for research purposes. It will not take too much burden on donors and also can increase their engagement. Besides, the data is pre-processed by manufacturers before shared with researchers; therefore, they will put little effort into data integration. Furthermore, since such program grants free access to the data for partners, it can significantly reduce the study cost.

Proposed method: The API-integrated platform includes both user-friendly front-end and automatic working back-end. Similar to DYD, complete data with accurate value is directly retrieved from the data pool of the manufacturers with standardized format (e.g. JSON); meanwhile, the back-end scripts can run continuously to stream instant data without any data delay. To ensure data privacy, sensitive information will not be collected by distributing a specific ID to each donor to represent their identity. A user-friendly interface will be provided through which the data donors can give their consent with simple click operations, and then it will build a bridge between the platform and their remote data pool. Based on this, the platform can automatically retrieve the data with no more task from donors. For researchers, all data can be stored in a structured database without any additional process of data integration, which will optimize querying of data and following analysis. Moreover, the provided real-time data presents more opportunities for further healthcare research.

Conducting studies to quantitively measure these five scales will be the next step. Among them, data completeness can be quantified by the proportion of non-blank values against the potential complete dataset. By using prepared survey questionnaires to measure data donors’ perceived level of privacy, data privacy can be evaluated in a subjective way. Furthermore, the timestamp of the latest data that can be retrieved reflects the level of data delay. About the usability, it will be measured through heuristic evaluation with a rating scale. More specifically, evaluators will be assigned tasks from the data donor and researcher side to give subjective ratings when using and comparing three methods. Besides, more objective metrics, such as task completion time, will be added to examine the methods. Thus, the comparison will continue to happen in future studies.

Application: This platform can be used on the study of public health monitoring through IoT sensors, especially for the study involves with “Big Data”, since it provides a reliable channel in the data collection which is a key component of the study. With the deployment of this step, it is feasible to apply data analysis techniques such as machine learning and deep learning on Big Data. The potential advantage of this platform is that it will considerably increase the data completeness and reduce the data collection time; meanwhile, it can also release data donors’ operation burden on data sharing, which may result in higher response rate. Furthermore, our platform will provide not only historical data but also real-time data. For the application of historical data, public health researchers can understand users’ past behaviour routines and give guidance to them to optimize their health. For the usage of instant data, researchers can develop a simultaneous health monitoring system with the feature of information visualization, such as a dashboard which will enable healthcare units to understand individuals’ health status and deploy their prompt cure.