Skip to content
/ HOME Public

Health records-linked Open Multi-consumer device Electrocardiogram (HOME) dataset

License

Notifications You must be signed in to change notification settings

xup6fup/HOME

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Health records-linked Open Multi-consumer device Electrocardiogram (HOME) dataset

Overview

The HOME Benchmark is an evaluation-only benchmark designed to assess the cross-device generalization performance of AI models on consumer-grade single-lead ECG waveforms, including recordings from Apple Watch and QOCA ECG102D devices. This benchmark is released to enable fair and standardized model comparison, while explicitly preventing model training, fine-tuning, domain adaptation, or commercial exploitation. Ground-truth labels are intentionally withheld. Model performance is assessed exclusively through a controlled submission and evaluation process.

IMPORTANT USAGE NOTICE (PLEASE READ CAREFULLY)

The HOME Benchmark dataset MUST NOT be used for:

  1. Model training of any kind
  2. Fine-tuning or partial fine-tuning
  3. Domain adaptation or transfer learning
  4. Self-supervised, contrastive, or representation learning
  5. Feature extractor pretraining
  6. Parameter optimization or calibration

This prohibition applies even if ground-truth labels are not provided. Access to waveform data does not imply permission to use the data for learning representations or updating model parameters. The dataset is evaluation-only.

Rationale

This benchmark is intentionally designed to:

  1. Enable fair cross-device model comparison
  2. Prevent implicit domain adaptation
  3. Avoid benchmark leakage and overfitting
  4. Protect downstream clinical translation and commercialization

This governance model follows best practices used in high-impact medical AI benchmarks and challenges.

Dataset Structure

The overall file structure is as follows:

HOME
├── code
│   ├── all-in-one.R
│   ├── suppoting-code
│   │   ├── ...
├── data-for-predicting
│   ├── QOCA_ECG102D_waveform.csv
│   ├── Apple_Watch_waveform.csv
│   ├── baseline-prediction
│   │   ├── 1-Gender 12-lead model (Apple).csv
│   │   ├── 1-Gender 12-lead model (QOCA).csv
│   │   ├── 1-Gender fine-tuning model (Apple).csv
│   │   ├── 1-Gender fine-tuning model (QOCA).csv
│   │   ├── ...
├── data-for-training
│   ├── train.csv
│   ├── ecg
│   │   ├── P00001.csv
│   │   ├── P00002.csv
│   │   ├── ...

The HOME Benchmark provides two waveform files corresponding to Apple Watch ('Apple_Watch_waveform.csv') and QOCA ECG102D ('QOCA_ECG102D_waveform.csv') devices. Each file is organized in a wide tabular format, where each column represents a unique individual, and each row corresponds to a time point in the ECG waveform.

Specifically, each dataset file contains 1000 columns, representing 1000 unique individuals, and 6,000 rows, representing uniformly sampled signal points from a single-lead (Lead I) ECG recording. The unit of waveform data here is 0.01 mV. The sampling frequency is 200 per one second. There are no repeated patients within or across files; each ECG waveform corresponds to a distinct individual. Column names correspond to unique identifiers (UIDs). These UIDs are used as the primary reference for task-specific evaluation and submission.

Task-specific Prediction and Submission Workflow

For each prediction task, a predefined subset of UIDs is specified as the evaluation cohort for that task. Users are required to:

  1. Load the waveform data from the appropriate device file (Apple Watch or QOCA ECG102D).
  2. Select the ECG waveforms corresponding to the task-specific UIDs.
  3. Perform inference only using a pre-trained, fixed model.
  4. Generate predictions for the specified UIDs.

Submit the results following the exact format defined in the corresponding sample_submission file for each task (in 'data-for-predicting/baseline-prediction' folder). Each sample_submission file explicitly defines:

  1. The required UIDs for that task
  2. The expected submission format
  3. The prediction type (e.g., probability or continuous value)

For baseline benchmarks, we provide 2 kind of model. First is '12-lead model' indicating that model is trained by Lead-I segments from resting 12-lead ECGs; Second is 'fine-tuning model' indicating that model is trained by Lead-I ECG signals from corresponding devices. In this cohort, ground-truth labels are not provided, and model performance is evaluated exclusively through the centralized evaluation pipeline.

Evaluation & Submission Policy

Model performance is evaluated only through centralized scoring (https://ailab.ndmutsgh.edu.tw/app/home-benchmark). Users must submit prediction files for evaluation; labels are never released. To submit predictions for evaluation, users must apply for an account. Account applications must be sent to: Chin Lin — xup6fup0629@gmail.com

Please include:

  1. Full name
  2. Institutional affiliation
  3. Academic email address
  4. Intended evaluation task(s)
  5. Username

Users who wish to become familiar with the evaluation workflow may log in using the following test account:

  • Username: test
  • Password: test

The test account allows unlimited submissions, but only with the system-provided default example data.
Any modification to the example submission files (including changes to values, UIDs, or formatting) will result in automatic rejection and cannot be uploaded. This test account is provided solely for demonstration and system familiarization purposes and does not perform real evaluation.

Researchers who apply for and receive an approved account may submit predictions for official evaluation.
For each approved account:

  • Each task is limited to a maximum of 10 submissions
  • Submissions beyond this limit will not be evaluated
  • The submission cap is strictly enforced

This submission limit is implemented to prevent adaptive probing, iterative attacks, and potential label leakage, thereby preserving the integrity and fairness of the benchmark.

Summary of the benchmarks

Device Task Name Total Subjects (n) Cases (n) Mean Age (years) Male (%) Baseline Lead-I AUC/r* Fine-tuned Model AUC/r*
Apple 1-Gender 1000 NA 62.5 52.9% 0.7068 0.7766
Apple 2-Age 1000 NA 62.5 52.9% 0.4819 0.5941
Apple 3-Death 824 22 63.9 52.4% 0.7794 0.7934
Apple 4-Low_EF 761 24 64.0 53.4% 0.7358 0.8419
Apple 5-High_PASP 762 49 64.0 53.4% 0.7334 0.7715
Apple 6-High_LA 762 35 64.0 53.4% 0.7357 0.8188
Apple 7-High_NT-proBNP 100 38 68.4 50.0% 0.8107 0.8169
Apple 8-Low_Hb 172 38 63.3 55.2% 0.6530 0.6967
Apple 9-Low_eGFR 383 108 64.7 55.1% 0.6935 0.7385
QOCA 1-Gender 1000 NA 60.0 52.5% 0.6166 0.7869
QOCA 2-Age 1000 NA 60.0 52.5% 0.3570 0.6104
QOCA 3-Death 823 64 61.4 52.7% 0.5038 0.7471
QOCA 4-Low_EF 343 29 65.9 53.9% 0.6480 0.8143
QOCA 5-High_PASP 341 29 65.9 53.4% 0.5054 0.6546
QOCA 6-High_LA 340 32 65.8 53.8% 0.6569 0.6808
QOCA 7-High_NT-proBNP 308 80 64.8 51.9% 0.7179 0.8167
QOCA 8-Low_Hb 471 67 59.3 55.8% 0.6214 0.7226
QOCA 9-Low_eGFR 655 155 60.6 53.4% 0.6435 0.7181

* Baseline AUC/r refers to the performance of a Lead-I model trained on 12-lead ECGs without exposure to consumer-device data.
* Fine-tuned AUC/r refers to device-specific fine-tuning using a separate training cohort that is not included in this benchmark.
* Details regarding the number of samples used to train the baseline Lead-I model and the device-specific fine-tuned models for each task are not repeated in this repository. Instead, readers are referred to the corresponding peer-reviewed literature listed at the end of this repository, where the full experimental design, cohort sizes, and training protocols are described in detail. * We provided all model predictions in 'data-for-predicting/baseline-prediction' folder. You can refer corresponding files as a reference.

Model training example

We also provide training codes for baseline models. The full raw ECG datasets and label-linked clinical data are not publicly released because they are derived from electronic health records and are subject to institutional and regulatory restrictions. Researchers with legitimate scientific interests may request access through a formal application process, which requires approval by the Tri-Service General Hospital Institutional Review Board and execution of a research-use data agreement. Approved researchers will be granted remote access under a secure VPN environment to a designated server hosted by Tri-Service General Hospital, where model development and analysis may be conducted. To protect data security and intellectual property, no raw data, trained model weights, or executable models may be exported from the secure environment. Only aggregate summary outputs, such as tables, figures, and statistical results, may be retrieved following review and approval. However, we have included some synthetic data to enable researchers to replicate the model training process. Once the model training is completed, you can try to use HOME dataset for validating your model. It's worth noting that this repository is built using the R software environment, specifically version 3.4.4, and utilizes MXNet version 1.3.0.

The "code/supporting-code" folder contains three scripts, which do not require preloading. You can simply execute the "all-in-one.R" script. Before running this script, make sure to place the "train.csv" file in the "data" folder. The "train.csv" file should include labels indicating whether the ECGs have any diseases. Additionally, within the "data/ecg" folder, please ensure you have the corresponding CSV files describing the waveform of each ECG (15000x1). The unit of waveform data here is 0.01 mV. The sampling frequency is 500 per one second.

Summary of the training sample of for baseline and fine-tining models

Device Task Name Total Subjects (12-lead ECG) Cases (12-lead ECG) Total Subjects (consumer device) Cases (consumer device)
Apple 1-Gender 474130 254481 6618 3467
Apple 2-Age 474130 NA 6618 NA
Apple 3-Death 380770 18584 5585 72
Apple 4-Low_EF 99745 9274 4879 204
Apple 5-High_PASP 99383 9084 4873 326
Apple 6-High_LA 99324 9227 4882 273
Apple 7-High_NT-proBNP 31931 14940 288 130
Apple 8-Low_Hb 324828 45115 1038 210
Apple 9-Low_eGFR 261743 71970 2218 735
QOCA 1-Gender 474130 254481 21927 11680
QOCA 2-Age 474130 NA 21927 NA
QOCA 3-Death 380770 18584 18451 156
QOCA 4-Low_EF 99745 9274 6244 223
QOCA 5-High_PASP 99383 9084 6241 359
QOCA 6-High_LA 99324 9227 6243 340
QOCA 7-High_NT-proBNP 31931 14940 968 236
QOCA 8-Low_Hb 324828 45115 8225 727
QOCA 9-Low_eGFR 261743 71970 11616 2204

Permitted Use

The dataset may be used only for:

  1. Inference-based evaluation of already-trained, fixed models
  2. Benchmarking and comparison of model performance
  3. Non-commercial academic research

Any use outside this scope is explicitly forbidden.

License and Restrictions

The HOME Benchmark is released under a custom Research & Evaluation License.

Key restrictions include:

  1. Non-commercial use only
  2. Evaluation-only usage
  3. No training or representation learning
  4. No redistribution of waveform data
  5. No attempt to infer or reconstruct labels

Use of this dataset constitutes agreement to all license terms.

See the LICENSE file for full details.

Citation Requirement

Any publication, preprint, presentation, or public report using the HOME Benchmark must cite the original benchmark paper:

Bridging the gap from clinical to home ECG: quantifying and overcoming accuracy loss in AI-enabled single-lead ECG models. (Under review)

Failure to cite the benchmark violates the terms of use.

About

Health records-linked Open Multi-consumer device Electrocardiogram (HOME) dataset

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages