2 apr 2025

Data Engineer - LLM Pipeline & Data Infrastructure

Branche	Zie onder
Dienstverband	Zie onder
Uren	Zie onder
Locatie	Amsterdam
Opleidingsniveau	Zie onder
Organisatie	Vox AI

Contactpersoon	Zie onder

Informatie

We're building an AI-powered conversational system for drive-thru automation. As our Data Engineer, you'll design and implement the infrastructure that powers our multi-stage LLM pipeline, from data capture to processing, model training, and deployment.

Tasks

Build scalable real-time data pipelines for audio processing, LLM interactions, and model training
Design comprehensive data storage solutions across object storage, NoSQL, and analytical databases
Implement data quality management with filtering, normalization, and enrichment capabilities
Create automated processes for data preparation, model evaluation, and continuous improvement
Develop observability systems with monitoring, alerting, and performance dashboards
Establish data security and compliance protocols, including privacy protection measures
Build resilient data systems with error recovery, backup, and integrity verification

Requirements

What You'll Need

Experience designing data pipelines for AI/ML applications
Expertise with Apache Airflow for workflow orchestration
Strong knowledge of Apache Spark for large-scale data processing
Experience with Apache Kafka for real-time event streaming
Proficiency with object storage systems (S3/MinIO) and database technologies (Cassandra/ScyllaDB, ClickHouse)
Understanding of monitoring tools (OpenTelemetry) and observability platforms
Experience implementing data security and compliance measures
Advanced Python programming skills

Preferred Experience

Audio data processing and conversational AI systems
LLM training and fine-tuning pipelines
Data quality frameworks (Great Expectations) and versioning tools (LakeFS, DVC)
Kubernetes for container orchestration
Multi-region deployment and distributed systems

Benefits

Build cutting-edge conversational AI systems with real-world impact
Work with modern, open-source technology stack
Help shape the future of automated customer service
Competitive compensation and flexible work arrangements

If you're passionate about building robust data systems for AI applications and excited by complex real-time data challenges, we'd love to talk.

Omschrijving

Tasks

Build scalable real-time data pipelines for audio processing, LLM interactions, and model training
Design comprehensive data storage solutions across object storage, NoSQL, and analytical databases
Implement data quality management with filtering, normalization, and enrichment capabilities
Create automated processes for data preparation, model evaluation, and continuous improvement
Develop observability systems with monitoring, alerting, and performance dashboards
Establish data security and compliance protocols, including privacy protection measures
Build resilient data systems with error recovery, backup, and integrity verification

Requirements

What You'll Need

Experience designing data pipelines for AI/ML applications
Expertise with Apache Airflow for workflow orchestration
Strong knowledge of Apache Spark for large-scale data processing
Experience with Apache Kafka for real-time event streaming
Proficiency with object storage systems (S3/MinIO) and database technologies (Cassandra/ScyllaDB, ClickHouse)
Understanding of monitoring tools (OpenTelemetry) and observability platforms
Experience implementing data security and compliance measures
Advanced Python programming skills

Preferred Experience

Audio data processing and conversational AI systems
LLM training and fine-tuning pipelines
Data quality frameworks (Great Expectations) and versioning tools (LakeFS, DVC)
Kubernetes for container orchestration
Multi-region deployment and distributed systems

Benefits

Build cutting-edge conversational AI systems with real-world impact
Work with modern, open-source technology stack
Help shape the future of automated customer service
Competitive compensation and flexible work arrangements

If you're passionate about building robust data systems for AI applications and excited by complex real-time data challenges, we'd love to talk.

Solliciteer direct

Terug

Meer vacatures (In Kaatsheuvel 5578)

204.929 vacatures

Data Engineer - LLM Pipeline & Data Infrastructure

Informatie

Omschrijving

Cookie instellingen

Waarom cookies?

Essentiele cookies

Analytics cookies

Advertentie cookies