Pleias

About the Role

Pleias builds highly efficient data tooling for enterprise AI. As an AI Engineer Intern, you'll work across our full stack - Synth (our synthetic data engine), Stratum (our AI-native data layer), and Common Corpus (the largest open, rights-cleared dataset) - and the small, specialized AI models that connect them. Most enterprise AI stalls on data that's raw, siloed, and messy; your job is to help turn it into structured, compliant, training-ready data, and to build the efficient models that run on it - on-premise, inside the customer's secure zone.

What You'll Do

Develop and optimize document-processing pipelines for complex PDFs, scans, tables, and forms - with particular focus on European and multilingual document formats
Build synthetic data generation with SYNTH, producing domain-specific training data on demand and closing the data-scarcity gap
Train and fine-tune AI models for specialized tasks (pretraining, mid-training / post-training)
Implement verification loops and quality-assurance systems that keep synthetic data reliable and model outputs accurate
Engineer data-harmonization solutions that normalize legacy and unstructured sources into shared schemas and ontologies
Contribute to dataset curation and provenance tooling
Optimize inference for production deployment, prioritizing efficiency, cost, and on-premise constraints

About You

Strong foundation in deep learning
Advanced Python and PyTorch
Experience with NLP tools and libraries
Knowledge of inference-optimization techniques

Nice to have: experience with document AI / vision-language models, synthetic data generation, and using (lots of) GPU :) ; interest in regulated industries (finance, telecom, public sector, healthcare); working across multiple European languages.

AI Engineer Intern

Your application

products

research

company

legal

contact