LLM Semantic QA · Cultural & Factuality Evaluation · AI Language Behavior Analysis
I work on qualitative evaluation of large language models, focusing on failure modes that are often missed by standard benchmarks: semantic over-interpretation, cultural misrecognition, pragmatic errors, and confidence miscalibration.
My background combines:
- multilingual linguistic QA (EN / ES)
- editorial accuracy and content evaluation
- applied analysis of how LLMs interpret, infer, and sometimes overreach
Rather than optimizing prompts for output quality alone, I document where and why models fail to understand language as humans do, especially in culturally localized or implicit contexts.
- Semantic and pragmatic QA for LLMs
- Cultural reference misrecognition (implicit humor, local context)
- Factuality vs. coherence analysis
- Model confidence calibration (when models should ask instead of infer)
- Spanish (Spain)–specific AI evaluation
-
prompt-eval-cases
Real-world prompt cases highlighting semantic and cultural failure modes (EN/ES) -
factuality-bias-review
Annotated examples of hallucinations, inconsistencies, and bias patterns -
qa-linguistic-checklist
Practical QA criteria for multilingual AI content
- BA in History, University of Alcalá (2006)
- Darmasiswa scholar, Indonesia (Surabaya, 2009–2010)
- 14+ years working with international clients in linguistic quality and content review
Open to remote collaboration in:
- LLM evaluation and QA
- Semantic & cultural error analysis
- Linguistic and bilingual (EN/ES) model evaluation
Email: alejandro.remeseiro(at)gmail.com
LinkedIn: https://www.linkedin.com/in/alejandro-remeseiro-fern%C3%A1ndez-44a02427/
Upwork profile: https://www.upwork.com/freelancers/~015bb79bd0df3c5e7f