Multi-Supplier Product Data Import System

Critical data import engine for the European Sourcing B2B marketplace - 254+ suppliers, 120+ database tables, 8 CSV import types, 5 languages, 7 years of development (2009-2016).

2009 - 2016

~7 years

Software Engineer then Senior Software Engineer

PHP 5.xMySQLSymfony 2.xSmartyjQuery 1.7BootstrapDockerVagrantApacheMemcachedProxySQLPostgreSQLFTPCSVXMLSVNGitDomPDFPHPMailer

Multi-Supplier Product Data Import System

Critical data import engine for the European Sourcing B2B marketplace - 254+ suppliers, 120+ database tables, 8 CSV import types, 5 languages, 7 years of development (2009-2016).

2009 - 2016

~7 years

Software Engineer then Senior Software Engineer

PHP 5.xMySQLSymfony 2.xSmartyjQuery 1.7BootstrapDockerVagrantApacheMemcachedProxySQLPostgreSQLFTPCSVXMLSVNGitDomPDFPHPMailer

Lines of Code

131,765

PHP application code (excl. third-party)

Suppliers

254+

254+ import folders identified

Database Tables

120+

120+ MySQL tables

Languages

FR, EN, DE, ES, IT

Presentation & Definition

A centralized import engine for the European promotional products market

The Import European Sourcing project is the data import and product management backbone of the European Sourcing platform - a major B2B marketplace connecting suppliers of promotional products (manufacturers, importers) with distributors/resellers across Europe.

The system ingests and normalizes product catalogs from 254+ suppliers spanning heterogeneous formats (CSV, XML, FTP feeds), processes them through validation and transformation pipelines, and writes them into a centralized MySQL database powering a multilingual search engine across 5 European languages.

The import system evolved through two major generations:

Legacy (2007-2016): A monolithic PHP application built on a custom MVC framework (SQLI), with batch scripts for catalog import and FTP-based auto-update
Modern (2016-2019): A Symfony-based extranet (v2) introducing asynchronous import queues, structured CSV schemas for 8 import types, and real-time progress tracking

Business Domain

B2B promotional products & merchandise - an industry connecting European manufacturers of branded goods (pens, textiles, office accessories, gifts) with resellers who customize and sell them to end clients.

Functional Scope

Bulk import of supplier catalogs (docs, product data, variants, markings, pricing options, stock levels), automated FTP/XML/CSV feed updates, supplier profile management, catalog PDF generation, and subscription management.

Objectives, Context, Stakes & Risks

Understanding the business drivers behind the technical challenge

Objectives

Centralize catalogs from 250+ European suppliers into a single normalized database
Automate product data updates (prices, stock, references) via FTP/XML/CSV feeds
Provide a multilingual extranet (5 languages: FR, EN, DE, ES, IT) for supplier and distributor management
Generate custom exports in CSV and PDF for distributors and sales teams
Deliver an advanced search engine with phonetic search, synonyms, and multi-criteria filtering

Context

The project was initially developed by SQLI (a French IT services company) on a custom PHP framework with SQL Server, later migrated to MySQL. The initial architecture (circa 2007-2010) relied on a custom MVC pattern with Smarty templating.

The project was owned by Medialeads (the company behind European Sourcing), whose GitHub organization is `github.com/medialeads`. Between 2016 and 2019, a modernized extranet (v2) was developed on Symfony with Bootstrap, Docker/Vagrant for development, and an asynchronous import queue system.

The infrastructure relied on a front/back MySQL architecture with load balancing: a front MySQL server (192.168.0.103) for the public site and a back MySQL server (192.168.0.102) for the admin extranet, with batch synchronization.

Business Stakes

Catalog completeness: the platform's value directly depends on the number and quality of referenced products
Data freshness: prices, stock and availability change frequently - automated updates are critical for platform reliability
Data quality: multilingual normalization, duplicate management, matching supplier refs to internal refs
Performance: with 15 GB SQL dump and hundreds of thousands of products, import batches and pre-calculations must be optimized
Revenue: paid subscription system for suppliers (€240/year), custom reseller mini-sites, advertising

Identified Risks

Significant Technical Debt

Custom PHP framework without ORM, SQL queries built by string concatenation, deprecated functions (magic_quotes, ereg)

Security Vulnerabilities

Cleartext credentials in configuration files (application.xml), non-parameterized SQL queries in batch scripts, hardcoded FTP passwords

Scalability Limits

Sequential import batches with set_time_limit of 1-4 hours, no queue system in legacy version

Supplier Dependency

Heterogeneous FTP feeds - each supplier has its own CSV/XML format, requiring custom parsing code per supplier

Steps - What I Did

A 12-year journey from legacy takeover to modern async imports

Phase 1

Initial Development & Migration

2007-2010

Development of the European Sourcing platform by SQLI with SQL Server database
Table-by-table migration from SQL Server to MySQL via ControlAdminImport module
Complex data recovery: countries, cities, services, organizations, statuses, subscriptions, families, products, criteria, synonyms, distributors, contacts, suppliers, catalogs, pages, indexations

Phase 2

Functional Enrichment

2010-2016

Added catalog import system (page scanning, thumbnails, automatic indexation)
Built reseller mini-site system (Kadobjet and 50+ branded domains)
Developed CSV/PDF exports, advertising management, newsletter system, statistics dashboard
Expanded from 84 VO model classes and 282 Smarty templates across 5 languages

Phase 3

Import Automation (FTP/XML)

2015-2018

Built auto_mise_a_jour.php batch system with supplier-specific code (MID OCEAN BRANDS x10 entities, PF Concept, Pixika, Delta, Inspirion, Topico...)
Automated FTP download, XML/CSV parsing, reference matching with existing products, price and stock updates
Handled heterogeneous supplier formats: multi-format references (int/string/multi with brackets), parasitic whitespace, duplicates

Phase 4

Extranet v2 & Modern Import

2016-2019

Rebuilt extranet on Symfony with Bootstrap UI, Docker/Vagrant dev environment
Designed and implemented asynchronous import system with queue (es_core_import)
Documented 8 CSV import types with structured schemas (products, variants, markings, pricing, docs, supplier profiles)
Added real-time progress tracking, import history with timestamps and operator names, FTP upload for CSV files

Automated FTP/XML Update Flow

Project Timeline (2007-2019)

Actors & Interactions

A multi-generational team of 20+ contributors across 12 years

20 contributors identified across PHP source files (@author tags), SVN metadata, Git logs, screenshots, and configuration files.

The project evolved through two distinct team phases:

SQLI Phase (2007-2010): 3 main developers (jebanquey, bvermeulen, lhuangoc) produced 79.8% of the original codebase
Medialeads Phase (2013-2019): Jose DA COSTA took over as primary developer (99.8% of SVN commits), later joined by Thomas C., fancyweb, amandine, bastien for the modernization

Non-technical stakeholders included import operators (Rahnia Sadaoui, Anthony Brifouillere, Paul Meyer), commercial managers, and 250+ external suppliers with automated feeds.

The development followed an incremental delivery model (no formal Agile methodology identified), with thematic releases organized in a `livraisons/` directory.

Original Code Authorship (@author tags, 155 PHP files)

SVN Contributions (11,868 nodes)

External Stakeholders

Medialeads - parent company behind European Sourcing
SQLI - IT services company that built the initial platform
Systonic - hosting provider (2010)
OVH - dedicated server hosting
Sogenactif (Societe Generale) - payment gateway
250+ suppliers with automated feeds (MID OCEAN, PF Concept, Pixika, Delta, Inspirion...)
50+ distributors with custom mini-sites (Kadobjet, avotrimage.fr, cadonor.fr, prodiges.com...)

Results

Impact for myself and for the company

For Me

Mastered large-scale data import engineering with heterogeneous multi-supplier formats
Acquired deep expertise in MySQL architecture (front/back separation, replication, denormalization)
Learned to manage a 12-year project lifecycle, balancing legacy maintenance with modernization
Developed batch processing skills (PHP CLI, cron scheduling, FTP automation)
Transitioned from inherited SQLI codebase to designing the Symfony v2 architecture independently

For the Company

European Sourcing became a reference player in the European promotional product market
Multi-country coverage (FR, UK, DE, ES, IT) with 250+ referenced suppliers
Hundreds of thousands of products with variants, markings, docs - 15 GB database
50+ custom distributor mini-sites active
12 fully functional features delivered: catalog import, bulk data import, auto-updates, multi-role extranet, advanced search, CSV/PDF export, mini-sites, subscriptions, statistics, advertising, newsletters, multilingual management

Codebase Metrics

Import Types - Number of CSV Fields per Type

The Aftermath

What happened after the final deployment

Immediate aftermath (2019): The comprehensive backups of March and August 2019 (including bash_history, SSH keys, screenshots, and multiple SQL dumps) suggest a transition phase - likely related to a departure or project archival. The August 2019 backup includes 441 screenshots systematically documenting the entire ecosystem.

Medium-term: The last modified file in the backup dates to November 2019. The exhaustive nature of the documentation (wiki pages, process documents, deployment scripts) indicates a conscious effort to ensure knowledge transfer.

Current state: The project is archived. The European Sourcing platform continued to operate independently, but the original codebase as documented in these backups represents a snapshot of a mature 12-year system at its final state. The technical choices (Symfony v2, Docker, async queues) demonstrated the team's awareness of the need for modernization.

The longevity of the initial architecture (12 years of production use) validates many of the original design choices, particularly the front/back MySQL separation and the denormalization strategy for multilingual search performance.

Critical Reflection

Honest retrospective on a 12-year engineering journey

Strengths

Longevity
The system operated for 12+ years - evidence of a fundamentally solid architecture
Broad Functional Coverage
Import, export, search, mini-sites, payments, statistics, newsletters - all from a relatively small team
Pragmatic Denormalization
Pre-calculated tables per language (__es_produits_selection_fr/en/de/es/it) effectively solved search performance
Mature MySQL Architecture
Read/write separation with ProxySQL load balancing was ahead of its time for a PHP monolith
Automated Supplier Feeds
FTP/XML auto-update system covered major suppliers, dramatically reducing manual data entry

Areas for Improvement

Custom Framework
Using a homegrown PHP framework instead of Symfony/Laravel created significant technical debt and made hiring/onboarding harder
Security Gaps
Cleartext credentials in code and config, non-parameterized SQL queries in some areas, no CORS/CSP headers identified
No Test Coverage
No unit or integration tests in the legacy version, making every evolution a risk
Supplier-Specific Code
auto_mise_a_jour.php contained hardcoded logic per supplier instead of a generic connector/adapter pattern
No CI/CD Pipeline
Manual deployment via SVN export and shell scripts, no automated testing or deployment pipeline

What I Would Do Differently

Adopt a community framework from the start (Symfony 1.x was available since 2007)
Design a generic supplier connector system with adapters per format (CSV, XML, JSON) and declarative configuration
Implement a message queue from day one (RabbitMQ, Beanstalk) instead of batch scripts with set_time_limit
Separate image storage into a CDN/S3 instead of local filesystem
Add automated tests progressively, starting with critical import and reference matching functions

Lasting Lessons

Heterogeneous data import is a major engineering problem - each supplier has unique formats, conventions, encodings, and quirks. A robust system needs extensible architecture with adapters, validation, and exhaustive logging.
Denormalization is a valid trade-off when read performance is critical and data evolves in batches - but synchronization must be automated and consistency monitored.
Long-lived projects inevitably accumulate technical debt - planning refactoring phases is essential rather than waiting for a total rewrite.
Migration is a process, not an event - the coexistence of legacy (PHP custom + SVN) and v2 (Symfony + Git) over several years shows migrations happen incrementally.