251.299 vacatures

2 apr 2025

Data Engineer - LLM Pipeline & Data Infrastructure

Branche Zie onder
Dienstverband Zie onder
Uren Zie onder
Locatie Amsterdam
Opleidingsniveau Zie onder
Organisatie Vox AI
Contactpersoon Zie onder

Informatie

We're building an AI-powered conversational system for drive-thru automation. As our Data Engineer, you'll design and implement the infrastructure that powers our multi-stage LLM pipeline, from data capture to processing, model training, and deployment.

Tasks
  • Build scalable real-time data pipelines for audio processing, LLM interactions, and model training
  • Design comprehensive data storage solutions across object storage, NoSQL, and analytical databases
  • Implement data quality management with filtering, normalization, and enrichment capabilities
  • Create automated processes for data preparation, model evaluation, and continuous improvement
  • Develop observability systems with monitoring, alerting, and performance dashboards
  • Establish data security and compliance protocols, including privacy protection measures
  • Build resilient data systems with error recovery, backup, and integrity verification
Requirements

What You'll Need

  • Experience designing data pipelines for AI/ML applications
  • Expertise with Apache Airflow for workflow orchestration
  • Strong knowledge of Apache Spark for large-scale data processing
  • Experience with Apache Kafka for real-time event streaming
  • Proficiency with object storage systems (S3/MinIO) and database technologies (Cassandra/ScyllaDB, ClickHouse)
  • Understanding of monitoring tools (OpenTelemetry) and observability platforms
  • Experience implementing data security and compliance measures
  • Advanced Python programming skills

Preferred Experience

  • Audio data processing and conversational AI systems
  • LLM training and fine-tuning pipelines
  • Data quality frameworks (Great Expectations) and versioning tools (LakeFS, DVC)
  • Kubernetes for container orchestration
  • Multi-region deployment and distributed systems
Benefits
  • Build cutting-edge conversational AI systems with real-world impact
  • Work with modern, open-source technology stack
  • Help shape the future of automated customer service
  • Competitive compensation and flexible work arrangements

If you're passionate about building robust data systems for AI applications and excited by complex real-time data challenges, we'd love to talk.

Omschrijving

We're building an AI-powered conversational system for drive-thru automation. As our Data Engineer, you'll design and implement the infrastructure that powers our multi-stage LLM pipeline, from data capture to processing, model training, and deployment.

Tasks
  • Build scalable real-time data pipelines for audio processing, LLM interactions, and model training
  • Design comprehensive data storage solutions across object storage, NoSQL, and analytical databases
  • Implement data quality management with filtering, normalization, and enrichment capabilities
  • Create automated processes for data preparation, model evaluation, and continuous improvement
  • Develop observability systems with monitoring, alerting, and performance dashboards
  • Establish data security and compliance protocols, including privacy protection measures
  • Build resilient data systems with error recovery, backup, and integrity verification
Requirements

What You'll Need

  • Experience designing data pipelines for AI/ML applications
  • Expertise with Apache Airflow for workflow orchestration
  • Strong knowledge of Apache Spark for large-scale data processing
  • Experience with Apache Kafka for real-time event streaming
  • Proficiency with object storage systems (S3/MinIO) and database technologies (Cassandra/ScyllaDB, ClickHouse)
  • Understanding of monitoring tools (OpenTelemetry) and observability platforms
  • Experience implementing data security and compliance measures
  • Advanced Python programming skills

Preferred Experience

  • Audio data processing and conversational AI systems
  • LLM training and fine-tuning pipelines
  • Data quality frameworks (Great Expectations) and versioning tools (LakeFS, DVC)
  • Kubernetes for container orchestration
  • Multi-region deployment and distributed systems
Benefits
  • Build cutting-edge conversational AI systems with real-world impact
  • Work with modern, open-source technology stack
  • Help shape the future of automated customer service
  • Competitive compensation and flexible work arrangements

If you're passionate about building robust data systems for AI applications and excited by complex real-time data challenges, we'd love to talk.

Solliciteer direct