---
title: "ETL Pipeline for Real Estate Listing Syndication (Ligneurs)"
description: "ETL pipeline from PIM Akeneo to real estate portals - multi-format delivery (XML, CSV, JSON) over 4 years of continuous operation."
locale: "en"
canonical: "https://portfolio.josedacosta.info/en/achievements/pipeline-etl-syndication-immobiliere"
source: "https://portfolio.josedacosta.info/en/achievements/pipeline-etl-syndication-immobiliere.md"
html_source: "https://portfolio.josedacosta.info/en/achievements/pipeline-etl-syndication-immobiliere"
author: "José DA COSTA"
date: "2019"
type: "achievement"
slug: "pipeline-etl-syndication-immobiliere"
tags: ["PHP", "Symfony", "Akeneo PIM v2", "REST API", "XML", "CSV", "JSON", "FTP/SFTP", "GitLab CI", "Docker", "Kubernetes", "MySQL"]
generated_at: "2026-04-24T08:31:44.606Z"
---

# ETL Pipeline for Real Estate Listing Syndication (Ligneurs)

ETL pipeline from PIM Akeneo to real estate portals - multi-format delivery (XML, CSV, JSON) over 4 years of continuous operation.

**Date:** January 2019 - 2023  
**Duration:** ~4 years  
**Role:** Technical Lead then Project Manager  
**Technologies:** PHP, Symfony, Akeneo PIM v2, REST API, XML, CSV, JSON, FTP/SFTP, GitLab CI, Docker, Kubernetes, MySQL

### Key Metrics

- Partner Portals: **-** - Migrated, integrated and maintained
- Several dozen: **-**
- Export Formats: **-** - XML, CSV, JSON
- Project Duration: **-** - Continuous evolution
- GitLab Branches: **-** - Features and hotfixes documented
- Daily Volume: **-** - Listings processed across all portals
- Availability: **-** - Over 4 years of continuous operation

## Presentation

_Project definition and scope_

### Nature

Automated ETL pipeline (Extract-Transform-Load) for multi-channel real estate ad distribution

### Domain

Real Estate / PropTech - B2B (internal teams, partner portals) and B2C (indirect, end buyers)

### Functional Scope

- Automated data extraction from PIM Akeneo v2 REST API
- Per-partner format transformation (XML, CSV, JSON)
- FTP/SFTP automated delivery to several dozen partner platforms
- Multi-format image adaptation (4/3, 16/9, panoramic, square)
- Property typology mapping (apartment, house, duplex, triplex, studio, T1-T5+)
- Execution monitoring with email alerts and centralized monitoring system
- Individual partner activation/deactivation capability
- SKU matching algorithm for real vs. manually-created PIM programs

### Technology Choices & Rationale

- State of the art in 2019 - Stack aligned with the B2B integration standard at the time: batch ETL and FTP/SFTP were the norm before webhooks and event-driven architectures became mainstream.
- PHP / Symfony - Consistent with the existing backend ecosystem. Symfony Console provided a solid framework for scheduled batch command execution.
- Akeneo PIM v2 - Strategic company choice for product catalog management. Its REST API provided structured access to all program and lot data with versioned endpoints.
- Docker / Kubernetes - Each export job isolated in its own container, preventing resource conflicts between partner modules. K8s on AWS EKS handled scheduling and auto-recovery of failed jobs.
- GitLab CI - Automated the build-test-deploy cycle for each partner module independently, allowing targeted deployments without impacting other active feeds.

### System Overview

### System Architecture

The **"Export Ligneurs"** system is the **automated real estate listing distribution engine** of the Groupe Pichet. It extracts program and lot data from the PIM Akeneo, transforms it into the specific format required by each partner (XML, CSV, or JSON), and automatically exports it to real estate distribution platforms.

The system serves as the **critical link between the company's product data and its commercial visibility**: every property listing published on major French real estate portals (SeLoger, LeBonCoin, BienIci, LogicImmo...) passes through this pipeline. Any interruption or data inconsistency directly translates into **lost leads and missed sales opportunities**.

As the **sole technical owner** of this system, I was responsible for all architecture decisions, development, deployment, monitoring, and incident response - with full accountability for a pipeline feeding an estimated *****K euros/month in lead acquisition**.

Export Ligneurs - System Architecture Overview

## Objectives, Context, Stakes & Risks

_Strategic vision and constraints_

### Objectives

- Migrate all export feeds from legacy PIM v1.4 to the new PIM v2 Akeneo
- Execute migration partner by partner with business validation at each step
- Verify data consistency between source PIM and feeds sent to portals
- Handle each portal's specificities (image formats, typologies, required fields)
- Automate feed supervision (error alerts, execution reports)

### Context

The project was initiated during the **knowledge transfer from Andoni L.** in January 2019. The existing system ran on the legacy PIM v1.4 and needed to be fully migrated to PIM v2 Akeneo while maintaining continuous service to all partner portals.

The migration had to be performed **portal by portal** - each with its own format specifications, required fields, image constraints, and property typology mappings - making it impossible to execute as a single "big bang" migration. Each partner required individual validation by the business teams before going live.

The system was embedded in a larger data ecosystem: upstream data came from the accounting software and in-house ERPs feeding the PIM, while downstream the feeds connected to around a hundred lead suppliers generating an estimated **1 lead every 2 seconds** across all portals.

### Stakes

The partner portals (SeLoger, LeBonCoin, BienIci...) are **major lead acquisition channels** in the real estate market. Any interruption or error in the feeds directly translates into **lost leads and reduced commercial pipeline**. With several dozen partners to migrate individually, the project required sustained attention over multiple years while maintaining zero downtime on active feeds.

### Risks

- Data Inconsistency - Risk of publishing incorrect prices, wrong images, or missing properties on partner portals - directly impacting buyer trust and commercial results.
- Service Interruption - Any feed failure means properties disappear from partner portals, causing immediate lead loss for the commercial teams.
- Format Divergence - Each portal has unique requirements (image ratios, typology codes, required fields) - a generic approach was impossible.
- API Instability - PIM Akeneo API connection issues could block all exports simultaneously, requiring solid error handling and retry logic.

### Key Architecture Decisions

- Modular per-partner architecture - One isolated module per portal instead of a generic engine - Fault isolation: a bug in one module cannot affect other partners. Independent deployment and testing per feed.
- Progressive migration over big-bang - Portal-by-portal migration with business validation at each step - Blast radius limited to one partner at a time, with immediate rollback capability if issues arise.
- ETL batch processing over real-time streaming - Scheduled batch exports via CRON jobs rather than event-driven publishing - Partners consumed data via FTP/SFTP drops, not webhooks. Real-time would have added complexity without benefit.
- Multi-format image pre-generation - Pre-generate all image variants centrally rather than on-demand per partner - Avoids redundant processing of the same image across portals and ensures upstream compliance.

### ETL Data Pipeline

Decision

Rationale

Extract-Transform-Load pipeline for partner feed generation

## The Steps - What I Did

_Chronological progression of the project_

- Phase 1 - Knowledge Transfer & Initial Migration - January 2019 - Sole technical owner within 2 weeks after handover from Andoni L. - Migrated first batch: SeLoger Neuf, LogicImmo, TULN, Paru Vendu - Established acceptance checklist reused for all subsequent migrations
- Phase 2 - Feature Development & New Integrations - June - September 2019 - Integrated BienIci with custom image adaptation - Adapted ImmoNeuf feed for 16/9 to 4/3 image conversion - Stabilized SeLoger and Knock feeds
- Phase 3 - Stabilization & Critical Fixes - January 2020 - Pricing validation guardrails added before publication - Circuit breaker and exponential backoff on PIM API calls - Structured logging to reduce incident diagnosis time
- Phase 4 - New Partners & Continuous Evolution - June 2020 - 2023 - Created Investimeo and BienIci integrations from scratch - Clean removal of Marketshot partner without side effects - Resolved NEEDOCS, BienIci and Green Valley anomalies

## Actors & Interactions

_Collaborative ecosystem_

### Coordination and Collaboration

### People Involved

- Andoni L. - Predecessor - Managed the complete knowledge transfer. Achieved autonomous operation of the full export system within 2 weeks, becoming the sole technical reference for the entire scope.
- Gaetan B. - Business referent - Defined and enforced acceptance criteria together for each migration: data accuracy, image compliance, typology mapping. Established a reusable validation checklist.
- Leslie A. - Business referent - Led the functional acceptance process, coordinating between technical fixes and business priorities to maintain migration velocity.
- Franck C. - Manager (N+1) - Reported migration progress, risk assessment, and resource needs. Provided technical recommendations for vendor coordination decisions.
- Sebastien B. - Vendor team - Coordinated production deployment scheduling. Established a deployment protocol: preprod validation, business sign-off, prod deployment, 24h monitoring window.

As the **sole technical owner**, I coordinated directly with business stakeholders, external vendors, and partner portals. Each migration involved defining acceptance criteria, piloting validation cycles, and making go/no-go decisions for production deployment. This required translating technical constraints into business terms and vice versa.

## Results

_Impact for me and for the company_

### For Me

- Full technical ownership of a business-critical system directly impacting revenue
- Autonomous architecture decisions with full accountability for reliability and data accuracy
- 4-year project piloted with business teams, external vendors and several dozen partner portals
- End-to-end lifecycle: architecture, development, deployment, monitoring and incident response
- Cross-functional leadership on validation processes and partner onboarding

### For the Company

- Several dozen partner portals migrated from PIM v1.4 to v2 Akeneo with zero service interruption
- 2 new partner integrations built from scratch (BienIci, Investimeo)
- Several thousand listings processed daily across all partner portals
- 99.5%+ availability over 4 years, average incident resolution under 4 hours
- Standardized property typology across all feeds, reducing data inconsistency reports

## Project Aftermath

_What happened after delivery_

### System Evolution

**Immediate aftermath**: After the 2019 migration wave, the system entered a **continuous maintenance phase** with new partner additions and anomaly resolution as needed.

**Medium term**: Resilience proven over 4 years, handling partner format changes and internal data model evolutions without disruption.

**Long-term perspective**: Became a **foundational piece of infrastructure** feeding the commercial pipeline. Modular architecture allowed scaling across several dozen portals without fundamental redesign, and any developer could add a new partner by following the established patterns.

## Critical Reflection

_Honest retrospective analysis_

### What Worked Well

- Portal-by-portal migration: minimal risk, business validation at each step, immediate rollback
- Modular architecture: easy add/modify/deactivate of feeds without side effects
- Standardized onboarding cut new-partner integration from weeks to days

### What Could Have Been Better

- A centralized monitoring dashboard would have replaced individual email alert checks
- Automated integration tests per partner format would have caught regressions earlier

### If I Had to Redo It Today

- Event-driven approach (Kafka/RabbitMQ) instead of CRON batch, with observability-first monitoring (OpenTelemetry, Grafana Tempo)
- A partner specification registry from day one to halve onboarding time
- Automated integration tests against each partner schema before deployment
- A centralized real-time monitoring dashboard instead of email alerts

### Key Lessons Learned

- In multi-partner systems, no "one size fits all" - each integration has unique constraints
- Long-running projects require a maintenance mindset from day one
- For revenue-critical systems, observability matters more than preventing every failure

### Additional context

- Cumulative Partner Migration Timeline
- Partner Type Distribution
- Export Format Distribution
- Migration Status Breakdown
- Technical Effort Distribution
- Property Types Handled

## Skills applied

_Technical and soft skills applied_

- **System Architecture & Design** - Designed the complete ETL architecture: modular per-partner pipeline, batch over real-time, centralized multi-format image pre-generation
- **Data Engineering & ETL** - End-to-end ETL pipeline from PIM Akeneo to several dozen portals: extraction, multi-format transformation (XML/CSV/JSON), FTP/SFTP delivery, monitoring
- **Full-Stack Development** - Sole technical owner over 4 years: PHP/Symfony, Akeneo PIM v2 integration, image processing, format generators, monitoring tooling
- **Project Management** - Piloted the portal-by-portal migration over 4 years with standardized onboarding process, acceptance checklists and zero downtime
- **Stakeholder Management** - Coordinated business referents, external vendors and partner portals - go/no-go decisions and translation between technical and business constraints
- **REST API Design** - Integrated PIM Akeneo v2 REST API for data extraction and several dozen partner endpoints (REST APIs and FTP/SFTP file drops)
- **Problem Solving & Critical Thinking** - Same-day pricing fixes, defensive patterns (circuit breaker, exponential backoff retry), unique per-partner constraints, structured logging for fast diagnosis
- **DevOps & CI/CD** - Docker/Kubernetes deployment with GitLab CI per partner module, enabling zero-downtime migration from PIM v1.4 to v2

## Related journey

_Professional experience linked to this achievement_

- **Technical Lead · Flows and Products: content and enterprise integration**

## Image gallery

_Project screenshots and visuals_

## Need an ETL syndication pipeline designed?

I delivered a multi-portal ETL syndication pipeline: PIM extraction, multi-format transformation (XML/CSV/JSON), FTP/SFTP delivery and monitoring over 4 years of continuous operation. Let's talk about your context.

**Contact me**
