Contact
Let's work together
Multi-Supplier Product Data Import System

Multi-Supplier Product Data Import System

Critical data import engine for the European Sourcing B2B marketplace - 254+ suppliers, 120+ database tables, 8 CSV import types, 5 languages, 7 years of development (2009-2016).

2009 - 2016
~7 years
Software Engineer then Senior Software Engineer
PHP 5.xMySQLSymfony 2.xSmartyjQuery 1.7BootstrapDockerVagrantApacheMemcachedProxySQLPostgreSQLFTPCSVXMLSVNGitDomPDFPHPMailer

Lines of Code

131,765

PHP application code (excl. third-party)

Suppliers

254+

254+ import folders identified

Database Tables

120+

120+ MySQL tables

Languages

5

FR, EN, DE, ES, IT

Presentation & Definition

A centralized import engine for the European promotional products market

The Import European Sourcing project is the data import and product management backbone of the European Sourcing platform - a major B2B marketplace connecting suppliers of promotional products (manufacturers, importers) with distributors/resellers across Europe.

The system ingests and normalizes product catalogs from 254+ suppliers spanning heterogeneous formats (CSV, XML, FTP feeds), processes them through validation and transformation pipelines, and writes them into a centralized MySQL database powering a multilingual search engine across 5 European languages.

The import system evolved through two major generations:

  • Legacy (2007-2016): A monolithic PHP application built on a custom MVC framework (SQLI), with batch scripts for catalog import and FTP-based auto-update
  • Modern (2016-2019): A Symfony-based extranet (v2) introducing asynchronous import queues, structured CSV schemas for 8 import types, and real-time progress tracking
Business Domain

B2B promotional products & merchandise - an industry connecting European manufacturers of branded goods (pens, textiles, office accessories, gifts) with resellers who customize and sell them to end clients.

Functional Scope

Bulk import of supplier catalogs (docs, product data, variants, markings, pricing options, stock levels), automated FTP/XML/CSV feed updates, supplier profile management, catalog PDF generation, and subscription management.

Objectives, Context, Stakes & Risks

Understanding the business drivers behind the technical challenge

Objectives
  • Centralize catalogs from 250+ European suppliers into a single normalized database
  • Automate product data updates (prices, stock, references) via FTP/XML/CSV feeds
  • Provide a multilingual extranet (5 languages: FR, EN, DE, ES, IT) for supplier and distributor management
  • Generate custom exports in CSV and PDF for distributors and sales teams
  • Deliver an advanced search engine with phonetic search, synonyms, and multi-criteria filtering
Context

The project was initially developed by SQLI (a French IT services company) on a custom PHP framework with SQL Server, later migrated to MySQL. The initial architecture (circa 2007-2010) relied on a custom MVC pattern with Smarty templating.

The project was owned by Medialeads (the company behind European Sourcing), whose GitHub organization is `github.com/medialeads`. Between 2016 and 2019, a modernized extranet (v2) was developed on Symfony with Bootstrap, Docker/Vagrant for development, and an asynchronous import queue system.

The infrastructure relied on a front/back MySQL architecture with load balancing: a front MySQL server (192.168.0.103) for the public site and a back MySQL server (192.168.0.102) for the admin extranet, with batch synchronization.

Business Stakes
  • Catalog completeness: the platform's value directly depends on the number and quality of referenced products
  • Data freshness: prices, stock and availability change frequently - automated updates are critical for platform reliability
  • Data quality: multilingual normalization, duplicate management, matching supplier refs to internal refs
  • Performance: with 15 GB SQL dump and hundreds of thousands of products, import batches and pre-calculations must be optimized
  • Revenue: paid subscription system for suppliers (€240/year), custom reseller mini-sites, advertising
Identified Risks

Significant Technical Debt

Custom PHP framework without ORM, SQL queries built by string concatenation, deprecated functions (magic_quotes, ereg)

Security Vulnerabilities

Cleartext credentials in configuration files (application.xml), non-parameterized SQL queries in batch scripts, hardcoded FTP passwords

Scalability Limits

Sequential import batches with set_time_limit of 1-4 hours, no queue system in legacy version

Supplier Dependency

Heterogeneous FTP feeds - each supplier has its own CSV/XML format, requiring custom parsing code per supplier

Steps - What I Did

A 12-year journey from legacy takeover to modern async imports

Phase 1
Initial Development & Migration
2007-2010
  • Development of the European Sourcing platform by SQLI with SQL Server database
  • Table-by-table migration from SQL Server to MySQL via ControlAdminImport module
  • Complex data recovery: countries, cities, services, organizations, statuses, subscriptions, families, products, criteria, synonyms, distributors, contacts, suppliers, catalogs, pages, indexations
Phase 2
Functional Enrichment
2010-2016
  • Added catalog import system (page scanning, thumbnails, automatic indexation)
  • Built reseller mini-site system (Kadobjet and 50+ branded domains)
  • Developed CSV/PDF exports, advertising management, newsletter system, statistics dashboard
  • Expanded from 84 VO model classes and 282 Smarty templates across 5 languages
Phase 3
Import Automation (FTP/XML)
2015-2018
  • Built auto_mise_a_jour.php batch system with supplier-specific code (MID OCEAN BRANDS x10 entities, PF Concept, Pixika, Delta, Inspirion, Topico...)
  • Automated FTP download, XML/CSV parsing, reference matching with existing products, price and stock updates
  • Handled heterogeneous supplier formats: multi-format references (int/string/multi with brackets), parasitic whitespace, duplicates
Phase 4
Extranet v2 & Modern Import
2016-2019
  • Rebuilt extranet on Symfony with Bootstrap UI, Docker/Vagrant dev environment
  • Designed and implemented asynchronous import system with queue (es_core_import)
  • Documented 8 CSV import types with structured schemas (products, variants, markings, pricing, docs, supplier profiles)
  • Added real-time progress tracking, import history with timestamps and operator names, FTP upload for CSV files
Automated FTP/XML Update Flow
Project Timeline (2007-2019)

Actors & Interactions

A multi-generational team of 20+ contributors across 12 years

20 contributors identified across PHP source files (@author tags), SVN metadata, Git logs, screenshots, and configuration files.

The project evolved through two distinct team phases:

  • SQLI Phase (2007-2010): 3 main developers (jebanquey, bvermeulen, lhuangoc) produced 79.8% of the original codebase
  • Medialeads Phase (2013-2019): Jose DA COSTA took over as primary developer (99.8% of SVN commits), later joined by Thomas C., fancyweb, amandine, bastien for the modernization

Non-technical stakeholders included import operators (Rahnia Sadaoui, Anthony Brifouillere, Paul Meyer), commercial managers, and 250+ external suppliers with automated feeds.

The development followed an incremental delivery model (no formal Agile methodology identified), with thematic releases organized in a `livraisons/` directory.

Original Code Authorship (@author tags, 155 PHP files)
SVN Contributions (11,868 nodes)
External Stakeholders
  • Medialeads - parent company behind European Sourcing
  • SQLI - IT services company that built the initial platform
  • Systonic - hosting provider (2010)
  • OVH - dedicated server hosting
  • Sogenactif (Societe Generale) - payment gateway
  • 250+ suppliers with automated feeds (MID OCEAN, PF Concept, Pixika, Delta, Inspirion...)
  • 50+ distributors with custom mini-sites (Kadobjet, avotrimage.fr, cadonor.fr, prodiges.com...)

Results

Impact for myself and for the company

For Me
  • Mastered large-scale data import engineering with heterogeneous multi-supplier formats
  • Acquired deep expertise in MySQL architecture (front/back separation, replication, denormalization)
  • Learned to manage a 12-year project lifecycle, balancing legacy maintenance with modernization
  • Developed batch processing skills (PHP CLI, cron scheduling, FTP automation)
  • Transitioned from inherited SQLI codebase to designing the Symfony v2 architecture independently
For the Company
  • European Sourcing became a reference player in the European promotional product market
  • Multi-country coverage (FR, UK, DE, ES, IT) with 250+ referenced suppliers
  • Hundreds of thousands of products with variants, markings, docs - 15 GB database
  • 50+ custom distributor mini-sites active
  • 12 fully functional features delivered: catalog import, bulk data import, auto-updates, multi-role extranet, advanced search, CSV/PDF export, mini-sites, subscriptions, statistics, advertising, newsletters, multilingual management
Codebase Metrics
Import Types - Number of CSV Fields per Type

The Aftermath

What happened after the final deployment

Immediate aftermath (2019): The comprehensive backups of March and August 2019 (including bash_history, SSH keys, screenshots, and multiple SQL dumps) suggest a transition phase - likely related to a departure or project archival. The August 2019 backup includes 441 screenshots systematically documenting the entire ecosystem.

Medium-term: The last modified file in the backup dates to November 2019. The exhaustive nature of the documentation (wiki pages, process documents, deployment scripts) indicates a conscious effort to ensure knowledge transfer.

Current state: The project is archived. The European Sourcing platform continued to operate independently, but the original codebase as documented in these backups represents a snapshot of a mature 12-year system at its final state. The technical choices (Symfony v2, Docker, async queues) demonstrated the team's awareness of the need for modernization.

The longevity of the initial architecture (12 years of production use) validates many of the original design choices, particularly the front/back MySQL separation and the denormalization strategy for multilingual search performance.

Critical Reflection

Honest retrospective on a 12-year engineering journey

Strengths
  • Longevity

    The system operated for 12+ years - evidence of a fundamentally solid architecture

  • Broad Functional Coverage

    Import, export, search, mini-sites, payments, statistics, newsletters - all from a relatively small team

  • Pragmatic Denormalization

    Pre-calculated tables per language (__es_produits_selection_fr/en/de/es/it) effectively solved search performance

  • Mature MySQL Architecture

    Read/write separation with ProxySQL load balancing was ahead of its time for a PHP monolith

  • Automated Supplier Feeds

    FTP/XML auto-update system covered major suppliers, dramatically reducing manual data entry

Areas for Improvement
  • Custom Framework

    Using a homegrown PHP framework instead of Symfony/Laravel created significant technical debt and made hiring/onboarding harder

  • Security Gaps

    Cleartext credentials in code and config, non-parameterized SQL queries in some areas, no CORS/CSP headers identified

  • No Test Coverage

    No unit or integration tests in the legacy version, making every evolution a risk

  • Supplier-Specific Code

    auto_mise_a_jour.php contained hardcoded logic per supplier instead of a generic connector/adapter pattern

  • No CI/CD Pipeline

    Manual deployment via SVN export and shell scripts, no automated testing or deployment pipeline

What I Would Do Differently
  • Adopt a community framework from the start (Symfony 1.x was available since 2007)
  • Design a generic supplier connector system with adapters per format (CSV, XML, JSON) and declarative configuration
  • Implement a message queue from day one (RabbitMQ, Beanstalk) instead of batch scripts with set_time_limit
  • Separate image storage into a CDN/S3 instead of local filesystem
  • Add automated tests progressively, starting with critical import and reference matching functions
Lasting Lessons
  • Heterogeneous data import is a major engineering problem - each supplier has unique formats, conventions, encodings, and quirks. A robust system needs extensible architecture with adapters, validation, and exhaustive logging.

  • Denormalization is a valid trade-off when read performance is critical and data evolves in batches - but synchronization must be automated and consistency monitored.

  • Long-lived projects inevitably accumulate technical debt - planning refactoring phases is essential rather than waiting for a total rewrite.

  • Migration is a process, not an event - the coexistence of legacy (PHP custom + SVN) and v2 (Symfony + Git) over several years shows migrations happen incrementally.

Full Ecosystem Architecture (Draw.io)

Full Ecosystem Architecture (Draw.io)
Core Data Model (ER Diagram)
CSV Import Data Flow (v2)
Technology Distribution
Infrastructure Distribution

Related journey

Professional experience linked to this achievement

Skills applied

Technical and soft skills applied

Image gallery

Project screenshots and visuals