Machine Learning & AI Research | Computer Vision | NLP
Rajshahi University of Engineering & Technology
koshik.debanath@gmail.com
Machine Learning, Deep Learning, Computer Vision, Natural Language Processing, Medical Image Analysis, Low-Resource Language Processing, Generative AI
Research background and academic profile
B.Sc. in Computer Science and Engineering
Rajshahi University of Engineering & Technology
CGPA: 3.27/4.00 (2018-2023)
• Machine Learning & Deep Learning
• Computer Vision & Medical Imaging
• Natural Language Processing
• Low-Resource Language Processing
I am a software engineer and researcher with expertise in machine learning, computer vision, and natural language processing. My work focuses on developing practical solutions for real-world problems, with experience in medical imaging analysis, low-resource language processing, and deep learning applications. I have contributed to multiple peer-reviewed publications and open-source projects in the field of artificial intelligence.
Python (Expert), C/C++, Java, JavaScript, SQL, MATLAB
PyTorch, TensorFlow, Keras, Scikit-learn, LangChain, Transformers, OpenCV
Generative AI (LLMs, RAG, Fine-tuning), NLP, Computer Vision, Deep Learning, Bayesian Methods, Scientific ML, Explainable AI
Git, Docker, FastAPI, Flask, Django, MLOps, Pinecone, MongoDB, MySQL
Peer-reviewed research in machine learning, computer vision, and natural language processing
Authors: K. Debanath, S. Aich, and A.Y Srizon
Abstract-Predictive mathematical models of biological processes like wound healing are essential for quantitative understanding, but their clinical utility is often limited by a critical roadblock: uncertainty in their biophysical parameters. These parameters are difficult to measure directly and must be inferred from sparse, noisy data. This paper presents a Bayesian Physics-Informed Neural Network (BPINN) framework to address this challenge by performing robust parameter inference and principled uncertainty quantification. We frame the identification of unknown parameters in a coupled reaction-diffusion system for wound healing as a Bayesian inverse problem. By integrating sparse observational data with the governing physical laws within a variational inference framework, the BPINN learns the full posterior distributions of unknown model parameters. Our results show that the framework accurately infers key reaction parameters from a dataset comprising less than 0.01% of the full spatio-temporal domain. More importantly, the BPINN correctly diagnoses that the cell motility parameter is practically non-identifiable from the sparse data, a conclusion supported by the large posterior uncertainty it assigns. The model’s predictive uncertainty is well-calibrated, being highest in regions far from observations. This work establishes the dual value of BPINNs as a powerful computational tool: both for developing reliable, personalized biomechanical models through data-driven calibration, and for diagnosing parameter identifiability issues—a critical step towards building trustworthy models in computational medicine and systems biology.
Authors: K. Debanath and A.Y. Srizon
Submitted to: Transactions on Asian and Low-Resource Language Information Processing (TALLIP)
Abstract—Effective semantic retrieval remains the primary bottleneck for Bengali Retrieval-Augmented Generation (RAG) systems. While general-purpose multilingual models exist, they often lack the semantic alignment required for high-precision tasks. This paper presents a comparative study of three embedding architectures—monolingual (shihab17), distilled multilingual (distiluse), and paraphrase-focused (mpnet)—to identify the strongest retrieval foundation for Bengali. The models are fine-tuned on the BanglaRQA dataset with a composite objective combining Multiple Negatives Ranking Loss and Matryoshka Representation Learning (MRL). Results show that architectural choice is decisive: the fine-tuned mpnet model achieves an NDCG@10 of 0.8114 and statistically outperforms the monolingual baseline (p < 0.001). Beyond retrieval quality, efficiency analysis for low-resource deployment shows that MRL-trained MPNet preserves 96% of retrieval effectiveness at 128 dimensions, reducing storage cost by 83% without meaningful accuracy loss.
CCS Concepts: Computing methodologies → Natural language processing; Neural networks. Information systems → Retrieval models and ranking.
2023 26th International Conference on Computer and Information Technology (ICCIT), Cox's Bazar, Bangladesh, pp. 1-6
Abstract: Knee injuries, prevalent in athletic and aging populations, pose significant challenges to healthcare professionals due to their complex nature and the critical function of the knee joint. Early and accurate diagnosis is paramount to ensure effective treatment and minimize long-term complications. Traditional diagnostic methods, including physical examinations and imaging techniques like MRI, require expert interpretation and can sometimes be inconclusive. This study introduces an approach to knee injury classification using deep learning techniques by leveraging convolutional neural networks (CNNs) with Attention Mechanism. This research work integrates powerful feature extraction capabilities of CNN and feature refinement of attention mechanism for the binary and multi-class classification of knee MRI images, with the aim of accurately identifying specific knee injury types. Based on our experiment on two comprehensive knee MRI datasets, our custom CNN model achieved 88% testing accuracy on Dataset-1 (Binary classification) and 77% accuracy on Dataset-2 (Multi-class classification). Meanwhile, the Attention-based CNN model achieved 100% accuracy on Dataset-1 (Binary Classification) and 91% accuracy on Dataset-2 (Multi-Class Classification). This approach not only holds promise for enhancing diagnostic accuracy but also for reducing the time to diagnosis.
Authors: K. Debanath, S. Aich, and A.Y Srizon
Abstract—Natural language processing (NLP) has witnessed significant advancements in recent years, particularly in improving question-answering (QA) systems for well-resourced languages such as English. However, the development of such systems for low-resource languages, including Bengali, remains insufficiently explored. This study proposes an approach to developing a Bengali QA system utilizing the Llama-3.2-3B-Instruct model, leveraging transfer learning techniques on a synthetic dataset derived from the SQuAD 2.0 benchmark...
Presented at ECCE 2025 Conference • Video & Slides Available
Authors: S. Aich, K. Debanath, and A.Y Srizon
Abstract—The Bengali language, rich in history and cultural significance, poses unique challenges in Natural Language Processing (NLP) due to its dual-register structure: Sadhu (formal) and Cholit (colloquial). These registers differ significantly in syntax, vocabulary, and usage, complicating tasks such as text classification, translation, and sentiment analysis...
Authors: K. Debanath, S. Aich, and A.Y Srizon
Abstract—Social media has become a battleground for political discourse, with automated accounts (bots) playing a growing role in shaping public opinion and engagement. In the context of the 2024 U.S. Presidential Election, understanding bot activity is crucial for identifying potential misinformation and manipulation tactics...
Authors: S. Aich, K. Debanath, and A.Y Srizon
Abstract—Enhancing interpretability without compromising accuracy is a critical challenge in text classification. This re- search explores the integration of Explainable Artificial In- telligence (XAI) techniques with advanced machine learning models, utilizing the Local Interpretable Model-Agnostic Expla- nations (LIME) framework to provide transparency. A fine-tuned BERT model achieved state-of-the-art performance, surpassing Random Forest and Sentence Embedding-based models with a perfect 100% accuracy (ROC-AUC score of 1.00). While Random Forest classifiers offered a solid baseline, they struggled with semantic nuances, underscoring the need for embedding- based approaches. The study highlights the inherent trade-off between interpretability and accuracy, demonstrating that while transformer-based models like BERT excel at capturing complex linguistic patterns, their ”black-box” nature necessitates tools like LIME for explainability. By bridging this gap, the research contributes to the development of more transparent, reliable, and high-performing AI systems.
Authors: K. Debanath, S. Aich, and A.Y. Srizon
Proceedings of the 3rd International Conference on Big Data, IoT and Machine Learning (BIM 2025), Lecture Notes in Networks and Systems, vol. 1798, pp. 447–457.
Abstract—The stability of modern power grids is increasingly challenged by dynamic disturbances, while conventional data-driven models often require extensive labeled fault data and offer limited interpretability. This work proposes a Physics-Informed Neural Network (PINN) framework for real-time anomaly detection by embedding the swing equation directly into the model loss. To address oscillatory training instability, the method integrates Fourier Feature Mapping with loss weight annealing. Trained only on sparse normal-operation data, the model reconstructs system states accurately and uses the physics residual as an interpretable anomaly signal. Experiments on a Single-Machine Infinite Bus system show instantaneous and precise fault detection, with clearer and more interpretable signals than an LSTM baseline.
Presented at BIM 2025 Conference • Video & Slides Available
Medical image analysis, object detection, image classification, and attention mechanisms for visual understanding.
Low-resource language processing, multilingual models, and language classification for Bengali and other languages.
Deep learning, neural networks, and AI applications in healthcare, finance, and sports analytics.
Multi-dimensional analysis of research domains and impact
Comprehensive overview showing the distribution of research across different domains, including publications, citations, and impact scores.
April 2025 - Present
March 2023 - April 2025
LSTM models for stock price prediction in Bangladeshi and global markets.
Interactive web application to classify human-written vs AI-generated text.
Chat-based interface for querying custom PDFs using Pinecone and LLaMA-2.
Authored and published two Python libraries to simplify data handling for RAG systems.
Web application to generate image captions using the Google Gemini Pro Vision API.
Implemented and compared multiple time-series models to predict product prices.
Implemented a KNN model using cosine similarity to recommend movies based on user input.
Built a CNN model achieving near-100% accuracy in classifying potato diseases from images.
Constructed an Artificial Neural Network with PyTorch to predict patient diabetes status.
Interactive visualizations and educational tools for understanding complex concepts
Advanced forensic tool to distinguish authentic photographs from AI-generated images (GANs, Diffusion). Uses explainable, physics-inspired and statistical features to detect synthetic "fingerprints" invisible to the naked eye.
An interactive web-based tool to visualize and understand convolution operations in deep learning and image processing.
Interactive tool to visualize attention mechanisms between sentences using various attention types and vector similarity measures.
An AI assistant plugin for Obsidian that allows you to ask questions, get responses, and include page content as context.
RAG framework and context engine for retrieval-augmented generation systems
View CommitOpen-source AI search and personal assistant platform
DeepLearning.AI - November 2024
A comprehensive short course on the end-to-end development of applications using text embeddings.