Back to careers

AI Engineer Intern

Team:R&DLocation:ParisType:Full-time

About the Role

Pleias builds highly efficient data tooling for enterprise AI. As an AI Engineer Intern, you'll work across our full stack - Synth (our synthetic data engine), Stratum (our AI-native data layer), and Common Corpus (the largest open, rights-cleared dataset) - and the small, specialized AI models that connect them. Most enterprise AI stalls on data that's raw, siloed, and messy; your job is to help turn it into structured, compliant, training-ready data, and to build the efficient models that run on it - on-premise, inside the customer's secure zone.

What You'll Do

  • Develop and optimize document-processing pipelines for complex PDFs, scans, tables, and forms - with particular focus on European and multilingual document formats
  • Build synthetic data generation with SYNTH, producing domain-specific training data on demand and closing the data-scarcity gap
  • Train and fine-tune AI models for specialized tasks (pretraining, mid-training / post-training)
  • Implement verification loops and quality-assurance systems that keep synthetic data reliable and model outputs accurate
  • Engineer data-harmonization solutions that normalize legacy and unstructured sources into shared schemas and ontologies
  • Contribute to dataset curation and provenance tooling
  • Optimize inference for production deployment, prioritizing efficiency, cost, and on-premise constraints

About You

  • Strong foundation in deep learning
  • Advanced Python and PyTorch
  • Experience with NLP tools and libraries
  • Knowledge of inference-optimization techniques

Nice to have: experience with document AI / vision-language models, synthetic data generation, and using (lots of) GPU :) ; interest in regulated industries (finance, telecom, public sector, healthcare); working across multiple European languages.

Your application