| Abstract: |
The project focuses on reducing computational requirements of Machine Learning models for real-time biomedical inference on low-power embedded devices. We will explore model compression techniques (e.g., quantization-aware training, structured pruning, low-rank factorization) to train and deploy neural networks while preserving task-specific performance (e.g. anomaly detection, feature extraction) under strict memory and latency constraints. ACCESS resources will be used for high-throughput model training, hyper parameter tuning, and resource profiling using CPU and GPU nodes on SDSC Expanse. We will also leverage project storage for managing clinical waveform datasets and generated model artifacts. Key software tools include PyTorch, TensorRT, TVM, TensorFlow Lite Micro, and custom profiling utilities for embedded deployment. The goal is to benchmark model configurations across performance metrics and produce a suite of deployable models for resource-constrained biomedical applications. |