The HOME Benchmark is an evaluation-only benchmark designed to assess the cross-device generalization performance of AI models on consumer-grade single-lead ECG waveforms, including recordings from Apple Watch and QOCA ECG102D devices. This benchmark is released to enable fair and standardized model comparison, while explicitly preventing model training, fine-tuning, domain adaptation, or commercial exploitation. Ground-truth labels are intentionally withheld. Model performance is assessed exclusively through a controlled submission and evaluation process.
The HOME Benchmark dataset MUST NOT be used for:
- Model training of any kind
- Fine-tuning or partial fine-tuning
- Domain adaptation or transfer learning
- Self-supervised, contrastive, or representation learning
- Feature extractor pretraining
- Parameter optimization or calibration
This prohibition applies even if ground-truth labels are not provided. Access to waveform data does not imply permission to use the data for learning representations or updating model parameters. The dataset is evaluation-only.
This benchmark is intentionally designed to:
- Enable fair cross-device model comparison
- Prevent implicit domain adaptation
- Avoid benchmark leakage and overfitting
- Protect downstream clinical translation and commercialization
This governance model follows best practices used in high-impact medical AI benchmarks and challenges.
The overall file structure is as follows:
HOME
├── code
│ ├── all-in-one.R
│ ├── suppoting-code
│ │ ├── ...
├── data-for-predicting
│ ├── QOCA_ECG102D_waveform.csv
│ ├── Apple_Watch_waveform.csv
│ ├── baseline-prediction
│ │ ├── 1-Gender 12-lead model (Apple).csv
│ │ ├── 1-Gender 12-lead model (QOCA).csv
│ │ ├── 1-Gender fine-tuning model (Apple).csv
│ │ ├── 1-Gender fine-tuning model (QOCA).csv
│ │ ├── ...
├── data-for-training
│ ├── train.csv
│ ├── ecg
│ │ ├── P00001.csv
│ │ ├── P00002.csv
│ │ ├── ...The HOME Benchmark provides two waveform files corresponding to Apple Watch ('Apple_Watch_waveform.csv') and QOCA ECG102D ('QOCA_ECG102D_waveform.csv') devices. Each file is organized in a wide tabular format, where each column represents a unique individual, and each row corresponds to a time point in the ECG waveform.
Specifically, each dataset file contains 1000 columns, representing 1000 unique individuals, and 6,000 rows, representing uniformly sampled signal points from a single-lead (Lead I) ECG recording. The unit of waveform data here is 0.01 mV. The sampling frequency is 200 per one second. There are no repeated patients within or across files; each ECG waveform corresponds to a distinct individual. Column names correspond to unique identifiers (UIDs). These UIDs are used as the primary reference for task-specific evaluation and submission.
For each prediction task, a predefined subset of UIDs is specified as the evaluation cohort for that task. Users are required to:
- Load the waveform data from the appropriate device file (Apple Watch or QOCA ECG102D).
- Select the ECG waveforms corresponding to the task-specific UIDs.
- Perform inference only using a pre-trained, fixed model.
- Generate predictions for the specified UIDs.
Submit the results following the exact format defined in the corresponding sample_submission file for each task (in 'data-for-predicting/baseline-prediction' folder). Each sample_submission file explicitly defines:
- The required UIDs for that task
- The expected submission format
- The prediction type (e.g., probability or continuous value)
For baseline benchmarks, we provide 2 kind of model. First is '12-lead model' indicating that model is trained by Lead-I segments from resting 12-lead ECGs; Second is 'fine-tuning model' indicating that model is trained by Lead-I ECG signals from corresponding devices. In this cohort, ground-truth labels are not provided, and model performance is evaluated exclusively through the centralized evaluation pipeline.
Model performance is evaluated only through centralized scoring (https://ailab.ndmutsgh.edu.tw/app/home-benchmark). Users must submit prediction files for evaluation; labels are never released. To submit predictions for evaluation, users must apply for an account. Account applications must be sent to: Chin Lin — xup6fup0629@gmail.com
Please include:
- Full name
- Institutional affiliation
- Academic email address
- Intended evaluation task(s)
- Username
Users who wish to become familiar with the evaluation workflow may log in using the following test account:
- Username: test
- Password: test
The test account allows unlimited submissions, but only with the system-provided default example data.
Any modification to the example submission files (including changes to values, UIDs, or formatting) will result in automatic rejection and cannot be uploaded. This test account is provided solely for demonstration and system familiarization purposes and does not perform real evaluation.
Researchers who apply for and receive an approved account may submit predictions for official evaluation.
For each approved account:
- Each task is limited to a maximum of 10 submissions
- Submissions beyond this limit will not be evaluated
- The submission cap is strictly enforced
This submission limit is implemented to prevent adaptive probing, iterative attacks, and potential label leakage, thereby preserving the integrity and fairness of the benchmark.
| Device | Task Name | Total Subjects (n) | Cases (n) | Mean Age (years) | Male (%) | Baseline Lead-I AUC/r* | Fine-tuned Model AUC/r* |
|---|---|---|---|---|---|---|---|
| Apple | 1-Gender | 1000 | NA | 62.5 | 52.9% | 0.7068 | 0.7766 |
| Apple | 2-Age | 1000 | NA | 62.5 | 52.9% | 0.4819 | 0.5941 |
| Apple | 3-Death | 824 | 22 | 63.9 | 52.4% | 0.7794 | 0.7934 |
| Apple | 4-Low_EF | 761 | 24 | 64.0 | 53.4% | 0.7358 | 0.8419 |
| Apple | 5-High_PASP | 762 | 49 | 64.0 | 53.4% | 0.7334 | 0.7715 |
| Apple | 6-High_LA | 762 | 35 | 64.0 | 53.4% | 0.7357 | 0.8188 |
| Apple | 7-High_NT-proBNP | 100 | 38 | 68.4 | 50.0% | 0.8107 | 0.8169 |
| Apple | 8-Low_Hb | 172 | 38 | 63.3 | 55.2% | 0.6530 | 0.6967 |
| Apple | 9-Low_eGFR | 383 | 108 | 64.7 | 55.1% | 0.6935 | 0.7385 |
| QOCA | 1-Gender | 1000 | NA | 60.0 | 52.5% | 0.6166 | 0.7869 |
| QOCA | 2-Age | 1000 | NA | 60.0 | 52.5% | 0.3570 | 0.6104 |
| QOCA | 3-Death | 823 | 64 | 61.4 | 52.7% | 0.5038 | 0.7471 |
| QOCA | 4-Low_EF | 343 | 29 | 65.9 | 53.9% | 0.6480 | 0.8143 |
| QOCA | 5-High_PASP | 341 | 29 | 65.9 | 53.4% | 0.5054 | 0.6546 |
| QOCA | 6-High_LA | 340 | 32 | 65.8 | 53.8% | 0.6569 | 0.6808 |
| QOCA | 7-High_NT-proBNP | 308 | 80 | 64.8 | 51.9% | 0.7179 | 0.8167 |
| QOCA | 8-Low_Hb | 471 | 67 | 59.3 | 55.8% | 0.6214 | 0.7226 |
| QOCA | 9-Low_eGFR | 655 | 155 | 60.6 | 53.4% | 0.6435 | 0.7181 |
* Baseline AUC/r refers to the performance of a Lead-I model trained on 12-lead ECGs without exposure to consumer-device data.
* Fine-tuned AUC/r refers to device-specific fine-tuning using a separate training cohort that is not included in this benchmark.
* Details regarding the number of samples used to train the baseline Lead-I model and the device-specific fine-tuned models for each task are not repeated in this repository. Instead, readers are referred to the corresponding peer-reviewed literature listed at the end of this repository, where the full experimental design, cohort sizes, and training protocols are described in detail.
* We provided all model predictions in 'data-for-predicting/baseline-prediction' folder. You can refer corresponding files as a reference.
We also provide training codes for baseline models. The full raw ECG datasets and label-linked clinical data are not publicly released because they are derived from electronic health records and are subject to institutional and regulatory restrictions. Researchers with legitimate scientific interests may request access through a formal application process, which requires approval by the Tri-Service General Hospital Institutional Review Board and execution of a research-use data agreement. Approved researchers will be granted remote access under a secure VPN environment to a designated server hosted by Tri-Service General Hospital, where model development and analysis may be conducted. To protect data security and intellectual property, no raw data, trained model weights, or executable models may be exported from the secure environment. Only aggregate summary outputs, such as tables, figures, and statistical results, may be retrieved following review and approval. However, we have included some synthetic data to enable researchers to replicate the model training process. Once the model training is completed, you can try to use HOME dataset for validating your model. It's worth noting that this repository is built using the R software environment, specifically version 3.4.4, and utilizes MXNet version 1.3.0.
The "code/supporting-code" folder contains three scripts, which do not require preloading. You can simply execute the "all-in-one.R" script. Before running this script, make sure to place the "train.csv" file in the "data" folder. The "train.csv" file should include labels indicating whether the ECGs have any diseases. Additionally, within the "data/ecg" folder, please ensure you have the corresponding CSV files describing the waveform of each ECG (15000x1). The unit of waveform data here is 0.01 mV. The sampling frequency is 500 per one second.
| Device | Task Name | Total Subjects (12-lead ECG) | Cases (12-lead ECG) | Total Subjects (consumer device) | Cases (consumer device) |
|---|---|---|---|---|---|
| Apple | 1-Gender | 474130 | 254481 | 6618 | 3467 |
| Apple | 2-Age | 474130 | NA | 6618 | NA |
| Apple | 3-Death | 380770 | 18584 | 5585 | 72 |
| Apple | 4-Low_EF | 99745 | 9274 | 4879 | 204 |
| Apple | 5-High_PASP | 99383 | 9084 | 4873 | 326 |
| Apple | 6-High_LA | 99324 | 9227 | 4882 | 273 |
| Apple | 7-High_NT-proBNP | 31931 | 14940 | 288 | 130 |
| Apple | 8-Low_Hb | 324828 | 45115 | 1038 | 210 |
| Apple | 9-Low_eGFR | 261743 | 71970 | 2218 | 735 |
| QOCA | 1-Gender | 474130 | 254481 | 21927 | 11680 |
| QOCA | 2-Age | 474130 | NA | 21927 | NA |
| QOCA | 3-Death | 380770 | 18584 | 18451 | 156 |
| QOCA | 4-Low_EF | 99745 | 9274 | 6244 | 223 |
| QOCA | 5-High_PASP | 99383 | 9084 | 6241 | 359 |
| QOCA | 6-High_LA | 99324 | 9227 | 6243 | 340 |
| QOCA | 7-High_NT-proBNP | 31931 | 14940 | 968 | 236 |
| QOCA | 8-Low_Hb | 324828 | 45115 | 8225 | 727 |
| QOCA | 9-Low_eGFR | 261743 | 71970 | 11616 | 2204 |
The dataset may be used only for:
- Inference-based evaluation of already-trained, fixed models
- Benchmarking and comparison of model performance
- Non-commercial academic research
Any use outside this scope is explicitly forbidden.
The HOME Benchmark is released under a custom Research & Evaluation License.
Key restrictions include:
- Non-commercial use only
- Evaluation-only usage
- No training or representation learning
- No redistribution of waveform data
- No attempt to infer or reconstruct labels
Use of this dataset constitutes agreement to all license terms.
See the LICENSE file for full details.
Any publication, preprint, presentation, or public report using the HOME Benchmark must cite the original benchmark paper:
Bridging the gap from clinical to home ECG: quantifying and overcoming accuracy loss in AI-enabled single-lead ECG models. (Under review)
Failure to cite the benchmark violates the terms of use.