| Abstract: |
This project aims to develop and evaluate a low-resource artificial intelligence (AI) debugging assistant designed to support Bengali-speaking learners in introductory programming environments. Many existing AI-based coding assistants are optimized for English and require significant computational resources, limiting their usefulness in regions where computing infrastructure is constrained. This project addresses these challenges by fine-tuning open-source language models to generate beginner-friendly debugging explanations in Bengali and evaluating their reliability on modest hardware.
ACCESS GPU resources will be used to train and refine staged versions of the model using parameter-efficient fine-tuning methods such as LoRA or QLoRA. Training workflows involve iterative dataset refinement, model evaluation, and comparison across multiple training stages to improve explanation clarity and correctness. GPU-enabled cloud environments such as Jetstream will allow reproducible experimentation across different configurations, including testing dataset scaling effects and training parameter variations.
Training workflows will include repeated experimental runs across multiple dataset scales and hyperparameter configurations to evaluate performance, robustness, and resource efficiency.
Software packages expected to be used include PyTorch, Transformers, PEFT, TRL, and Unsloth for training and inference workflows, along with Python-based evaluation tools and lightweight web-based interfaces (e.g., Gradio) for testing model usability. Model outputs will be evaluated using structured scoring rubrics to measure correctness, clarity, and pedagogical usefulness.
This project builds on ongoing research into equitable AI access and support for low-resource languages. A related case study has been accepted for poster presentation at the 69th Annual Conference of the International Linguistic Association (ILA), scheduled for April 30 – May 2, 2026, in New York City at John Jay College, City University of New York. In addition, the primary research model associated with this effort has been submitted for peer review to ACM COMPASS, examining infrastructure and accessibility challenges affecting underrepresented languages. The long-term objective of this work is to develop deployable models capable of running reliably on commodity hardware, supporting practical use in educational and self-learning environments. |