AI Engineer Intern
About the Role
Pleias builds highly efficient data tooling for enterprise AI. As an AI Engineer Intern, you'll work across our full stack - Synth (our synthetic data engine), Stratum (our AI-native data layer), and Common Corpus (the largest open, rights-cleared dataset) - and the small, specialized AI models that connect them. Most enterprise AI stalls on data that's raw, siloed, and messy; your job is to help turn it into structured, compliant, training-ready data, and to build the efficient models that run on it - on-premise, inside the customer's secure zone.
What You'll Do
- Develop and optimize document-processing pipelines for complex PDFs, scans, tables, and forms - with particular focus on European and multilingual document formats
- Build synthetic data generation with SYNTH, producing domain-specific training data on demand and closing the data-scarcity gap
- Train and fine-tune AI models for specialized tasks (pretraining, mid-training / post-training)
- Implement verification loops and quality-assurance systems that keep synthetic data reliable and model outputs accurate
- Engineer data-harmonization solutions that normalize legacy and unstructured sources into shared schemas and ontologies
- Contribute to dataset curation and provenance tooling
- Optimize inference for production deployment, prioritizing efficiency, cost, and on-premise constraints
About You
- Strong foundation in deep learning
- Advanced Python and PyTorch
- Experience with NLP tools and libraries
- Knowledge of inference-optimization techniques
Nice to have: experience with document AI / vision-language models, synthetic data generation, and using (lots of) GPU :) ; interest in regulated industries (finance, telecom, public sector, healthcare); working across multiple European languages.