Hi, I'm Matthew

A Computer Science student who Machine Learning and Data!

About Me

Hello, I’m Matthew, and I am a final-semester Computer Science student at Lublin University of Technology. I am an AI/ML Engineer and Data Scientist with a deep passion for developing machine learning and deep learning models. I focus on achieving a profound understanding of concepts, data preprocessing, analysis, feature engineering, hyperparameters, and architectures to design, build, and fine-tune models effectively and precisely. I have a strong affinity for mathematics and English and enjoy solving algorithmic problems, such as those found on LeetCode. I am particularly drawn to projects involving complex and intricate datasets that require a deep and comprehensive approach to creating accurate and well-optimized architectures and pipelines. My work reflects a strong commitment to advancing within the AI landscape.

Skills

Underneath you may find my technical & miscellaneous skills

Polish - Native Speaker

99%

English - C2

97%

Spanish & French - A1+

5%

Data analysis, feature engineering, preprocessing, data comprehension

98%

Machine Learning Models (DT, SVM, SVR, RF, ET, KNN)

95%

Boosting Algorithms (XGBoost, CatBoost, LightGBM)

90%

Statistical Models & Regression (Linear, Ridge, Lasso, Elastic Net, Logistic)

90%

Clustering (DBSCAN, K-Means)

85%

Statistical Testing (t-tests, ANOVA, power analysis, correlation)

85%

Neural Networks (MLP, CNN, RNN, LSTM, GRU)

90%

NLP (Transformers, LSTM, tokenization, embeddings)

85%

RAG, LangGraph, LangChain

85%

Data transformations & statistical metrics (Z-transform, mean, variance, std, mode)

90%

Python

95%

PyTorch & Torchvision

90%

TensorFlow & Keras

85%

Scikit-Learn

90%

Data Structures & Algorithms

85%

Numpy & Pandas

85%

Visualization (Matplotlib, Pyplot, Seaborn)

75%

Optuna (Hyperparameter Tuning)

85%

Pillow (Image Processing)

75%

Git

85%

Additionally, I have experience with C++, Java, HTML, CSS, JavaScript, Swift, and PHP, and mainly work on Linux and Windows operating systems.

Linux, Ubuntu, Bash

95%

AWS (SageMaker, S3 buckets, pipelines)

60%

Kubernetes OpenShift and Azure

70%

Docker, Docker Compose

90%

Jenkins, Prometheus, Grafana

60%

Jupyter Notebook

95%

PyCharm

85%

Visual Studio Code

80%

IntelliJ IDEA

75%

I utilize these platforms in my daily workflow for efficient machine learning and software development, enabling structured experimentation, clean code management, and scalable project organization.

Projects

Beneath, you may find a selection of projects that I have worked on.

Advanced RAG System – Scalable Multi-Backend Architecture

This large-scale project features an advanced Retrieval-Augmented Generation (RAG) system built within a RESTful API architecture. It leverages a dual-backend design with Django for user and PostgreSQL database management, and FastAPI for RAG logic within isolated containers to ensure modularity and scalability. The Angular-based frontend integrates seamlessly with both backends, providing a dynamic and responsive interface. Fully dockerized and CI/CD-ready via Jenkins, the system is currently being deployed to Azure Kubernetes Service (AKS), with Prometheus and Grafana monitoring planned for live production environments.

Explore this project on GitHub

Brain Tumor Prediction Model

This brain tumor prediction model harnesses a fine-tuned CNN architecture, leveraging extensive data augmentation and rigorous preprocessing to achieve over 96% accuracy. Outperforming expert radiologists, it integrates advanced deep learning techniques with meticulous optimization to reliably detect and classify tumors. Designed for precision and efficiency, this model serves as a powerful diagnostic tool, supporting early detection and informed medical decisions in critical healthcare scenarios.

Explore this project on GitHub

Depression Predicting Model

The Depression Predicting Model is a machine learning model that predicts whether a person has depression. It is implemented with scikit-learn and designed to make predictions based on various input features, carefully adjusted and refined. It employs robust preprocessing, data analysis, PCA, and feature engineering, combined with an SVM model, due to the high dimensionality and nature of the data. This is broad yet straightforward approach makes it well-suited for tabular datasets with moderate complexity, ensuring slight interpretability while leveraging the power of SVM, which is well-suited for this binary classification task.

Explore this project on GitHub

Life Expectancy Model

The Life Expectancy Model is a deep learning model implemented in TensorFlow, designed to predict life expectancy based on various input features. The data was carefully adjusted and refined for the problem before the model was created. It employs a relatively shallow architecture with a few dense layers and uses the Adam optimizer for efficient, adaptive gradient descent, minimizing a linear regression loss to closely align predictions with actual values. The model achieves excellent regression performance, with an average prediction error of approximately ±1.2 years, and with good confidence intervals, which is a competetive result.

Explore this project on my GitHub

Titanic Kaggle Dataset Prepared

This is the prepared dataset to predict whether the person died or survived the Titanic disaster for a Kaggle competition. The code is highly flexible and includes hints at each step, allowing you to make modifications as needed. Additionally, it is provided as a notebook, so you can experiment with the code before using the dataset for ML/DL tasks. It covers data engineering, preprocessing, and adjustments tailored to ML and DL architectures, complete with detailed annotations.

Explore this project on GitHub

Work Experience

Here are some of my work experiences so far.

September 2024 – now

Kaggle / Hugging Face Independent Researcher

Created multiple architectures and models from complex Kaggle and Hugging Face datasets (cleaning and preprocessing included), currently building CI/CD pipelines for integration, maintainability, and reproducibility, currently learning and deplying Azure Kubernetes clusters based on resource and requirement analysis, and conducting non-commercial research in deep learning.

Blog

A blog is currently in development. It will feature insights, experiences, and knowledge-sharing on the topics of machine learning, and on the architectures of the deep learning models.

Contact

Feel free to contact me through the following platforms, down below, I respond usually within the same given day or the next one.