| Abstract: |
PRISM is a Graph Neural Network (GNN) framework for predicting cancer drug sensitivity (IC50 values) that incorporates biological pathway structure to achieve generalization across unseen cancer types. The system encodes protein-protein interaction networks from STRING directly into the model architecture, enabling discovery of pathway-level patterns that transfer across cancer histologies. The core challenge: traditional ML (XGBoost, Random Forest) achieves reasonable performance on random splits but fails catastrophically (negative R²) on held-out cancer subtypes—the clinically relevant scenario. GNNs offer a principled solution by learning over biological graph structures rather than memorizing feature correlations.
Research Goals:
1. Benchmark GNN architectures (GAT, GCN, GIN) against XGBoost/Random Forest baselines on histology-based and tissue-based generalization splits using GDSC (Genomics of Drug Sensitivity in Cancer) data
2. Develop multi-task architecture for 265 drugs with shared biological encoder, using Ray Tune distributed hyperparameter optimization with ASHA early-stopping
3. Evaluate biological prior integration strategies using STRING protein-protein interactions and KEGG pathway membership for graph construction
In short, here are our planned use of ACCESS resources:
GPU Computing: PyTorch Geometric GNN training with graph attention over ~5,000 gene nodes; Ray Tune parallel architecture search across layers, hidden dimensions, attention heads
CPU Computing: Data preprocessing, STRING graph construction (~20K proteins, millions of edges), XGBoost/Random Forest baseline training
Storage: Multi-omics dataset (~2 GB), processed PyTorch tensors (~5 GB), model checkpoints from architecture search (~20 GB), logs (~5 GB). Total: ~35 GB
Benchmarking: Compare GNN generalization performance vs traditional ML on challenging cross-cancer-type prediction splits
We anticipate the following software requirements:
Containers: Singularity/Apptainer for HPC compatibility
GNN Stack: Python 3.10+, PyTorch 2.0+, PyTorch Geometric 2.4+, Ray Tune
ML Baselines: scikit-learn, XGBoost
Data Processing: pandas, NumPy, NetworkX, h5py
We request 400,000 ACCESS credits for initial architecture search and benchmarking. |