S. F. Mona Ebrahimi

| AI | Machine Learning | Data Science |NLP | Researcher | Tampere University/Sharif University of Technology
  • 👋 Hi, I'm Mona — I teach machines to make sense of messy language (and sometimes people).
  • 🎓 Doctoral researcher in Machine Learning & Data Science, fortunate to be supervised by Professor Jaakko Peltonen.
  • 🌍 5+ years building deep learning systems for low-resource and multilingual NLP
  • 🤖 Into LLMs, representation learning, explainable AI, and making models play nicely across modalities
  • 🔬 I blend academic rigor with real-world engineering to make smart things smarter
  • 📬 Let’s connect — always open to research collabs, cool ideas, or just a good AI chat!

Research Experience

Doctoral Researcher

Tampere University

I'm currently pursuing my Ph.D. in Computer Science with a focus on probabilistic machine learning and data science. My research primarily explores social media text analysis but also extends to representation learning, and the design of interpretable learning systems. I'm particularly interested in how we can uncover hidden structures in large, noisy datasets across different modalities.

2023 – Present

Research Assistant

Worked on several NLP projects with a focus on low-resource and morphologically rich languages. Contributed to research on style transfer, language generation, and deep neural models for Persian. My Master's thesis on formality style transfer combined linguistic theory with modern deep learning methods.

2020 – 2022

Independent Research Projects

NLP & Speech

In parallel with my formal research, I've worked on independent projects involving speech processing and automatic speech recognition, along with cross-modal research connecting audio and text. These experiences shaped my broader interest in multimodal learning and the joint modeling of signals across different formats.

Job Experience

NLP Engineer

Iran Telecommunication Research Center (ITRC)

Worked on NLP and Large Language Models (LLMs) for Persian and multilingual applications. Focused on prompt engineering, fine-tuning, and evaluation of LLMs.

2023

Mentor – Speech Recognition & Synthesis

Mentored students in core speech and language processing techniques based on the course by Dr. Hossein Sameti. Supported practical understanding and guided learners through real-world audio processing tasks.

2022

NLP Engineer

Developed Persian NLP tools and pipelines for document processing and intelligent tagging systems. Key projects included:

  • Designed a Persian verb analyzer for downstream NLP tasks
  • Created a rule-based PDF text extraction module for commercial use
2018 – 2019

NLP Engineer / Researcher

Contributed to applied research in NLP and speech, leading development of real-world tools for Persian text and audio data:

  • Led quality control for large-scale speech datasets
  • Built Romand: a production-ready Persian QA assistant
  • Worked on speech recognition and text classification modules
2020 – 2023

Quality Control Supervisor

Oversaw the QA process for annotation pipelines and speech data alignment, ensuring accuracy and consistency in training corpora.

2020 – 2021

Education

Tampere University

Doctor of Philosophy in Computing and Electrical Engineering (DPCEE)

Faculty of Information Technology and Communication Sciences (ITC)
Focus: Machine Learning, Data Science, and Social Media Analysis

2024 – Present

Sharif University of Technology

Master of Science in Natural Language Processing

Thesis title: Formality Style Transfer Using Deep Neural Networks
Relevant courses: Machine Learning, Artificial Intelligence, Statistical NLP, Computational Linguistics, Syntax, Morphology, Semantics

2019 – 2022

Technical Skills

Programming & Scripting Languages

Python, R, MATLAB, Java

Machine Learning & NLP Frameworks

PyTorch, TensorFlow, Keras, HuggingFace Transformers, Scikit-learn, Gensim, spaCy, NLTK

Web & Front-End

HTML, CSS, Bootstrap

Back-End & Databases

Node.js, Express, MongoDB, FastAPI, SQL

DevOps & Cloud

AWS, Docker, Git, GitHub Actions, CI/CD

Data Analysis & Visualization

Pandas, NumPy, Matplotlib, Seaborn

Languages

Kurdish (Native), Persian (Native), English (Fluent), Finnish (Basic)

Research & Reporting Tools

LaTeX, Jupyter, Overleaf, Microsoft Office

Interests

Representation Learning – Learning structured patterns from data across modalities, especially in language and speech.

Social Media Analysis – Investigating behavioral trends and narratives from large-scale online discourse.

Multimodal Machine Learning – Integrating text, audio, and visual data for richer modeling and understanding.

Speech & Language Technologies – Building systems that understand and generate human language, with a focus on low-resource contexts.

Interpretable & Constrained Models – Designing AI with structure, transparency, and human alignment in mind.

Applied Machine Learning – Transforming research ideas into real-world tools for meaningful impact.

Projects

  • Adaptive AI System for Continual Learning [Capstone, GPT Lab x Bittium, 2025] – Built a multimodal, continually updating AI using RAG, LoRA/QLoRA, and FastAPI. Integrated speech, vision, and text; deployed real-time with GitHub, ArXiv, and StackOverflow retrieval. [GitHub]
  • TUBEDU Project [Health Care, Mental Health] – Machine learning analysis of Finnish YouTube vlogs to uncover patterns in youth mental health discourse. Focused on representation learning, topic modeling, and clustering.
  • Multilingual Machine Generated Text Detection [LLMs, Evaluation] – Developed a black-box framework to detect machine-generated text across languages. Published in ACL@SemEval-2024.
  • Fact-Checking in the COVID-19 Era [Fake News, Dataset] – Built a Persian fact-checking corpus from COVID-19 news to support misinformation detection research.
  • Sharif-STR & Sharif-MGTD ACL@SemEval [Benchmark, Transformers] – Participated in SemEval-2024 with transformer-based models for semantic relatedness and text authenticity.
  • Semantic Relatedness for Low-Resource Languages [Cross-lingual NLP] – Designed transformer models to evaluate semantic similarity in African and Asian languages.
  • Romand Intelligent Assistant [Dialog Systems] – Created and deployed a production-ready chatbot for real-world Q&A in Persian.
  • Farsquad [Style Transfer, Persian] – Developed a deep learning pipeline and dataset for formality style transfer in Persian.

Other Academic Merits

  • Invited Reviewer – AISTATS 2025
  • Reviewer – TPDL 2025, Finland
  • Technical Facilitator – ICWE 2024, Tampere (AV + session support)
  • Invited Reviewer – SemEval 2024, Mexico City
  • Conference Volunteer – EACL 2023, Croatia
  • Attendee – EACL 2023, Dubrovnik

Certifications

  • Microsoft Power BI – Full pipeline: Power Query, DAX, dashboards, multi-source modeling
  • Full Stack – React, Node.js, MongoDB, REST, GraphQL, CI/CD, Docker