Contact
ETL Pipeline for Real Estate Listing Syndication (Ligneurs)

ETL Pipeline for Real Estate Listing Syndication (Ligneurs)

ETL pipeline from PIM Akeneo to real estate portals - multi-format delivery (XML, CSV, JSON) over 4 years of continuous operation.

January 2019 - 2023
~4 years
Technical Lead then Project Manager
PHPSymfonyAkeneo PIM v2REST APIXMLCSVJSONFTP/SFTPGitLab CIDockerKubernetes (K8s)MySQL

Partner Portals

Several dozen

Migrated, integrated and maintained

Export Formats

3

XML, CSV, JSON

Project Duration

~4 years

Continuous evolution

Availability

99.5%+

Over 4 years of continuous operation

Presentation

Project definition and scope

System Overview

The "Export Ligneurs" system is the automated real estate listing distribution engine of the Groupe Pichet. It extracts program and lot data from the PIM Akeneo, transforms it into the specific format required by each partner (XML, CSV, or JSON), and automatically exports it to real estate distribution platforms.

The system serves as the critical link between the company's product data and its commercial visibility: every property listing published on major French real estate portals (SeLoger, LeBonCoin, BienIci, LogicImmo...) passes through this pipeline. Any interruption or data inconsistency directly translates into lost leads and missed sales opportunities.

As the sole technical owner of this system, I was responsible for all architecture decisions, development, deployment, monitoring, and incident response - with full accountability for a pipeline feeding an estimated ***K euros/month in lead acquisition.

Nature

Automated ETL pipeline (Extract-Transform-Load) for multi-channel real estate ad distribution

Domain

Real Estate / PropTech - B2B (internal teams, partner portals) and B2C (indirect, end buyers)

Functional Scope
  • Automated data extraction from PIM Akeneo v2 REST API
  • Per-partner format transformation (XML, CSV, JSON)
  • FTP/SFTP automated delivery to several dozen partner platforms
  • Multi-format image adaptation (4/3, 16/9, panoramic, square)
  • Property typology mapping (apartment, house, duplex, triplex, studio, T1-T5+)
  • Execution monitoring with email alerts and centralized monitoring system
  • Individual partner activation/deactivation capability
  • SKU matching algorithm for real vs. manually-created PIM programs
System Architecture
Export Ligneurs - System Architecture Overview
Technology Choices & Rationale

State of the art in 2019

Stack aligned with the B2B integration standard at the time: batch ETL and FTP/SFTP were the norm before webhooks and event-driven architectures became mainstream.

PHP / Symfony

Consistent with the existing backend ecosystem. Symfony Console provided a solid framework for scheduled batch command execution.

Akeneo PIM v2

Strategic company choice for product catalog management. Its REST API provided structured access to all program and lot data with versioned endpoints.

Docker / Kubernetes

Each export job isolated in its own container, preventing resource conflicts between partner modules. K8s on AWS EKS handled scheduling and auto-recovery of failed jobs.

GitLab CI

Automated the build-test-deploy cycle for each partner module independently, allowing targeted deployments without impacting other active feeds.

Objectives, Context, Stakes & Risks

Strategic vision and constraints

Objectives
  • 1Migrate all export feeds from legacy PIM v1.4 to the new PIM v2 Akeneo
  • 2Execute migration partner by partner with business validation at each step
  • 3Verify data consistency between source PIM and feeds sent to portals
  • 4Handle each portal's specificities (image formats, typologies, required fields)
  • 5Automate feed supervision (error alerts, execution reports)
Context

The project was initiated during the knowledge transfer from Andoni L. in January 2019. The existing system ran on the legacy PIM v1.4 and needed to be fully migrated to PIM v2 Akeneo while maintaining continuous service to all partner portals.

The migration had to be performed portal by portal - each with its own format specifications, required fields, image constraints, and property typology mappings - making it impossible to execute as a single "big bang" migration. Each partner required individual validation by the business teams before going live.

The system was embedded in a larger data ecosystem: upstream data came from the accounting software and in-house ERPs feeding the PIM, while downstream the feeds connected to around a hundred lead suppliers generating an estimated 1 lead every 2 seconds across all portals.

Stakes

The partner portals (SeLoger, LeBonCoin, BienIci...) are major lead acquisition channels in the real estate market. Any interruption or error in the feeds directly translates into lost leads and reduced commercial pipeline. With several dozen partners to migrate individually, the project required sustained attention over multiple years while maintaining zero downtime on active feeds.

Risks

Data Inconsistency

Risk of publishing incorrect prices, wrong images, or missing properties on partner portals - directly impacting buyer trust and commercial results.

Service Interruption

Any feed failure means properties disappear from partner portals, causing immediate lead loss for the commercial teams.

Format Divergence

Each portal has unique requirements (image ratios, typology codes, required fields) - a generic approach was impossible.

API Instability

PIM Akeneo API connection issues could block all exports simultaneously, requiring solid error handling and retry logic.

Key Architecture Decisions

Modular per-partner architecture

Decision: One isolated module per portal instead of a generic engine

Rationale: Fault isolation: a bug in one module cannot affect other partners. Independent deployment and testing per feed.

Progressive migration over big-bang

Decision: Portal-by-portal migration with business validation at each step

Rationale: Blast radius limited to one partner at a time, with immediate rollback capability if issues arise.

ETL batch processing over real-time streaming

Decision: Scheduled batch exports via CRON jobs rather than event-driven publishing

Rationale: Partners consumed data via FTP/SFTP drops, not webhooks. Real-time would have added complexity without benefit.

Multi-format image pre-generation

Decision: Pre-generate all image variants centrally rather than on-demand per partner

Rationale: Avoids redundant processing of the same image across portals and ensures upstream compliance.

ETL Data Pipeline
Extract-Transform-Load pipeline for partner feed generation

The Steps - What I Did

Chronological progression of the project

Phase 1
Knowledge Transfer & Initial Migration
January 2019
  • Sole technical owner within 2 weeks after handover from Andoni L.
  • Migrated first batch: SeLoger Neuf, LogicImmo, TULN, Paru Vendu
  • Established acceptance checklist reused for all subsequent migrations
Phase 2
Feature Development & New Integrations
June - September 2019
  • Integrated BienIci with custom image adaptation
  • Adapted ImmoNeuf feed for 16/9 to 4/3 image conversion
  • Stabilized SeLoger and Knock feeds
Phase 3
Stabilization & Critical Fixes
January 2020
  • Pricing validation guardrails added before publication
  • Circuit breaker and exponential backoff on PIM API calls
  • Structured logging to reduce incident diagnosis time
Phase 4
New Partners & Continuous Evolution
June 2020 - 2023
  • Created Investimeo and BienIci integrations from scratch
  • Clean removal of Marketshot partner without side effects
  • Resolved NEEDOCS, BienIci and Green Valley anomalies

Actors & Interactions

Collaborative ecosystem

Coordination and Collaboration

As the sole technical owner, I coordinated directly with business stakeholders, external vendors, and partner portals. Each migration involved defining acceptance criteria, piloting validation cycles, and making go/no-go decisions for production deployment. This required translating technical constraints into business terms and vice versa.

Andoni L. (Predecessor)·Gaetan B. (Business referent)·Leslie A. (Business referent)·Franck C. (Manager (N+1))·Sebastien B. (Vendor team)

Results

Impact for me and for the company

For Me
  • Full technical ownership of a business-critical system directly impacting revenue
  • Autonomous architecture decisions with full accountability for reliability and data accuracy
  • 4-year project piloted with business teams, external vendors and several dozen partner portals
  • End-to-end lifecycle: architecture, development, deployment, monitoring and incident response
  • Cross-functional leadership on validation processes and partner onboarding
For the Company
  • Several dozen partner portals migrated from PIM v1.4 to v2 Akeneo with zero service interruption
  • 2 new partner integrations built from scratch (BienIci, Investimeo)
  • Several thousand listings processed daily across all partner portals
  • 99.5%+ availability over 4 years, average incident resolution under 4 hours
  • Standardized property typology across all feeds, reducing data inconsistency reports
Partner Type Distribution
Export Format Distribution

Project Aftermath

What happened after delivery

System Evolution

Immediate aftermath: After the 2019 migration wave, the system entered a continuous maintenance phase with new partner additions and anomaly resolution as needed.

Medium term: Resilience proven over 4 years, handling partner format changes and internal data model evolutions without disruption.

Long-term perspective: Became a foundational piece of infrastructure feeding the commercial pipeline. Modular architecture allowed scaling across several dozen portals without fundamental redesign, and any developer could add a new partner by following the established patterns.

Technical Effort Distribution

Critical Reflection

Honest retrospective analysis

What Worked Well
  • Portal-by-portal migration: minimal risk, business validation at each step, immediate rollback
  • Modular architecture: easy add/modify/deactivate of feeds without side effects
  • Standardized onboarding cut new-partner integration from weeks to days
What Could Have Been Better
  • A centralized monitoring dashboard would have replaced individual email alert checks
  • Automated integration tests per partner format would have caught regressions earlier
If I Had to Redo It Today
  • Event-driven approach (Kafka/RabbitMQ) instead of CRON batch, with observability-first monitoring (OpenTelemetry, Grafana Tempo)
  • A partner specification registry from day one to halve onboarding time
  • Automated integration tests against each partner schema before deployment
  • A centralized real-time monitoring dashboard instead of email alerts
Key Lessons Learned
  • In multi-partner systems, no "one size fits all" - each integration has unique constraints
  • Long-running projects require a maintenance mindset from day one
  • For revenue-critical systems, observability matters more than preventing every failure

Related journey

Professional experience linked to this achievement

Skills applied

Technical and soft skills applied

Image gallery

Project screenshots and visuals

Need an ETL syndication pipeline designed?

I delivered a multi-portal ETL syndication pipeline: PIM extraction, multi-format transformation (XML/CSV/JSON), FTP/SFTP delivery and monitoring over 4 years of continuous operation. Let's talk about your context.

Contact me