Compression and Profiling of Machine Learning Models in Biomedical Embedded Sensing

Abstract:	The project focuses on reducing computational requirements of Machine Learning models for real-time biomedical inference on low-power embedded devices. We will explore model compression techniques (e.g., quantization-aware training, structured pruning, low-rank factorization) to train and deploy neural networks while preserving task-specific performance (e.g. anomaly detection, feature extraction) under strict memory and latency constraints. ACCESS resources will be used for high-throughput model training, hyper parameter tuning, and resource profiling using CPU and GPU nodes on SDSC Expanse. We will also leverage project storage for managing clinical waveform datasets and generated model artifacts. Key software tools include PyTorch, TensorRT, TVM, TensorFlow Lite Micro, and custom profiling utilities for embedded deployment. The goal is to benchmark model configurations across performance metrics and produce a suite of deployable models for resource-constrained biomedical applications.

2025	Indiana Jetstream2 CPU	62,647.0 SUs
2025	Indiana Jetstream2 GPU	40,282.0 SUs
2025	MATCH Services	Yes
The estimated value of these awarded resources is $9,622.70. The allocation of these resources represents a considerable investment by the NSF in advanced computing infrastructure for the U.S. The dollar value of the allocation is estimated from the NSF awards supporting the allocated resources.

There are no other allocations for this project.

There are no prior titles for this project.