
Data, AI & Machine Learning
Strategic axis of my professional project. ACCENSEO LLM workflows, AdsPower ML platform, Ligneurs ETL pipeline, and SaaS for accountants and credit brokers. Scaling data engineering and applied AI to production-grade systems.
Each segment is a period (journey or achievement) where the competency was applied. The colour and size of the end dot reflect the level reached during that period.
My definition
Data, AI, and machine learning is, in my own definition, the competency that turns events and texts into decisions. It covers relational and NoSQL databases, data engineering, ML fundamentals, and applied LLM workflows (RAG, agentic, evaluation). It is the explicit strategic axis of my 2026-2028 project: integrate generative AI in compliance-aware workflows and operate data at the scale of a regulated vertical B2B SaaS.
I work on 3 layers I hold in parallel. Storage and modelling: advanced SQL, Prisma modelling (~91 models accounting SaaS, 98 broker SaaS), MongoDB and PostgreSQL in production at several hundred GB of RAM at ACCENSEO. Pipelines: custom ETL (Akeneo Ligneurs), Azure ML Studio ML pipelines (AdsPower 2016-2018), multi-vendor enrichment (Claude, GPT, Gemini, TRELLIS, TripoSR, Shap-E). Applied AI: hands-on RAG in the ACCENSEO pipeline, classification, 3D generation, multilingual translation, attribute extraction from visuals. Skill actively levelling toward Senior on the data engineering + applied ML + LLM-Ops triptych.
In 2026, the competitive moat of a vertical B2B SaaS is no longer in the chosen LLM but in the context you give it, proprietary permissioned data, real task execution with guardrails, and embedded distribution. This is the thesis Microsoft Azure develops in 10 RAG Shifts Redefining Production AI in 2026: agentic RAG is now the default pattern for answering complex questions and executing actions, and hybrid RAG is the production baseline. The CTO who knows how to design an industrialised RAG pipeline (eval + drift detection + cost per feature) on a regulated domain becomes sought after.
My evidence
Anecdote 1 : Co-founding AdsPower around AdTech ML pipelines
In January 2016, I co-founded AdsPower as CTO and Technical Project Manager of an early-stage bootstrapped startup. The bet: compete with Optmyzr (US) and Dolead (FR) using an ML-first approach to automatically optimise bids on Google AdWords, Bing Ads and Facebook Ads. The market was dominated by heuristic-based recommendation engines, and Azure ML Studio had just left public preview - the window was real, but so was the challenge: scarcity of ML skills in Bordeaux in 2017 and a limited runway.
I built a complete ML pipeline: a Data Collection Service wired to Google AdWords + Bing Ads + Facebook Ads SDKs, a custom SERP Scraper (Goutte + CasperJS) covering 6 search engines (Google, Bing, Yahoo, Yandex, Baidu, DuckDuckGo) and absorbing more than 10 million requests per month through a Memcached cache + Redis queue, and a Python Flask sidecar running NLTK + TF-IDF for multilingual NLP. On the modelling side, I trained supervised classifiers on Azure ML Studio for bid prediction, k-means clusters for negative-keyword detection, and the Google Prediction API for audience segmentation. The application stack: Symfony 3.2 + Angular with Electron desktop builds (Mac/Windows/Linux) and Cordova mobile (iOS/Android). To source ML freelancers, I ran geo-targeted GitHub searches on the machine-learning tags.
3 major product iterations shipped in less than a year with a team of 4 freelancers I steered as Technical Project Manager, the platform covering 3 ad networks (Google, Bing, Facebook) with sub-500 ms recommendation latency, and 3 active beta testers on the v1 in November 2016.
That venture taught me viscerally that classification + bid optimisation can be productised - not just demoed in a notebook. The reflexes I forged there (sub-second latency, heuristic fallback when model confidence is low, quality-score monitoring) are the very ones I now replay on the ACCENSEO LLM workflows. AdsPower never reached PMF before the runway ran out, but it was my first real production ML school.
Anecdote 2 : Industrialising multi-vendor LLM enrichment at ACCENSEO
At ACCENSEO, one of the recurring themes with my e-commerce and PIM customers is massive AI-driven product enrichment: tens of thousands of product sheets to optimise - automatic taxonomy, SEO rewriting, image processing (background removal, watermarking), 3D model generation, multi-language translation, attribute extraction from visuals. The trap: locking yourself onto a single LLM vendor means inheriting its outages, pricing, and rate limits.
I built a multi-vendor pipeline by default. On the text side, I integrated OpenAI GPT, Anthropic Claude and Google Gemini behind a router that picks the model per task (Claude for precision, GPT for creativity, Gemini for lightweight multimodal). On the 3D side, I wired in TRELLIS, TripoSR, and Shap-E to generate 3D models from product photos. On the image side, automated background processing, cut-out, and watermarking. Orchestration runs through n8n and Make.com for automated workflows, Power Automate for Microsoft triggers, and the whole thing runs on dedicated OVH servers to keep customer catalogue data confidential.
Enrichment deployed at scale across the e-commerce platforms of several customers (real estate, fashion, viticulture, automotive, fitted kitchens), measurable catalogue quality lift without a linear human cost - and an internal product Addly derived from this expertise for Confluence/Atlassian Forge.
On this work I understood that production generative AI is won on observability discipline (token cost, latency, detected hallucination rate) and on multi-vendor strategy, not on prompt sophistication. That is the angle I want to push on the next CTO scale-up role: turn AI into a moat, not into a demo gimmick.
Anecdote 3 : Akeneo to portal real-estate ETL pipeline (Ligneurs)
For 4 years at Pichet (2019-2023), I was the sole technical owner of the Ligneurs export pipeline - the automated syndication engine for the group's real-estate listings, feeding around 20 partner portals (SeLoger, LeBonCoin, BienIci, LogicImmo...). The system fed an estimated volume of one lead every 2 seconds across all portals. Any interruption translated directly into lost leads and missed revenue.
I designed a per-partner modular architecture rather than a generic engine: one isolated Docker container per portal, orchestrated by Kubernetes on AWS EKS, with GitLab CI for targeted deployments that did not impact the other flows. On the ETL side, the pipeline extracts from the Akeneo PIM v2 REST API, transforms to each portal's specific format (XML, CSV, JSON), pre-renders multi-format images (4/3, 16/9, panoramic, square) centrally to avoid per-partner reprocessing, and ships via automated FTP/SFTP. I added defensive patterns on heterogeneous sources: circuit breaker on the PIM API, retry logic on FTP uploads, SKU matching algorithm between manual programs and ERP programs. The v1.4 to v2 migration was done portal by portal with business validation at every step, never big-bang.
Zero-downtime migration across every partner portal, centralised monitoring with automated email alerts, and the pipeline ran in continuous operation for 4 years without a major listing loss - no equivalent in the department was running with that level of reliability.
That project raised the data engineering bar I now carry on every ACCENSEO engagement: per-partner isolation, batch processing where real-time streaming brings nothing, observability per flow from day one. It is also where I durably understood data architectural debt: a generic single module looks easy at write time but becomes unmanageable at the tenth partner integration.
My self-critique
Level Confirmed actively levelling toward Senior. Foundations are solid: advanced SQL, Prisma modelling (~91 models accounting SaaS, 98 broker SaaS), MongoDB and PostgreSQL in production over hundreds of GB at ACCENSEO, Azure ML Studio ML pipelines (AdsPower), and multi-vendor applied LLM workflows (Claude, GPT, Gemini, TRELLIS, TripoSR, Shap-E). What still needs strengthening: industrialised RAG with eval and guardrails, production-grade MLOps (versioning, drift detection), and very large-scale data engineering (>TB).
Explicit strategic axis of my 2026-2027 project. It stitches three layers. data foundations (rapid schema reading, pipeline audit), applied ML (classification, scoring, recommendation), and generative AI in production (RAG, agents, eval). For a vertical B2B SaaS scale-up CTO role, it is what turns AI into a *moat* rather than a demo gimmick.
Deliberate Confirmed → Senior climb triggered late 2024 and still ongoing: hands-on RAG plugged into the ACCENSEO pipeline, multi-vendor (Claude + GPT + Gemini), AI enrichment of tens of thousands of product sheets. Cadence is measurable quarter by quarter.
To myself: ship one small RAG or agentic project per quarter, with explicit eval, to keep the competency sharp, and maintain a journal of prompts that work and that don't. To others: *do not confuse AI demo with AI production*, invest from day one in pipeline observability (token cost, latency, detected hallucination rate) and in guardrails (sanitisation, rate limit, human fallback). Pick a data-first stack before the model stack.
My evolution in this skill
The 2026-2028 strategic axis
Data and AI are the axis that distinguishes my CTO profile in 2026. In the 24-month plan, they let me frame an AI-augmented vertical B2B SaaS product, hire a coherent data + ML / LLM team, and defend an AI product trajectory in front of a board separating *moat* from *commodity*. Without that axis, the 2026-2028 CTO role boils down to a modern-stack operator role.
By end of 2027, the observable goal is to operate a production-grade data + AI platform with industrialized RAG pipeline (eval + drift detection), explicit cost per AI feature and quarterly quality review. The Confirmed-to-Senior shift is measured on the triple mastery data engineering + applied ML + LLM-Ops, not on an abstract score.
Hands-on RAG integrated into the ACCENSEO pipeline (Claude + GPT + Gemini multi-vendor, TRELLIS / TripoSR / Shap-E for 3D generation), weekly intake of LLM releases (Anthropic, OpenAI, Mistral, DeepSeek). Master in Software Engineering active until 2026.
DeepLearning.AI Specialization and Coursera MLOps programs planned 2026-2027. Maven *Applied LLM* cohort (Hamel Husain for example) targeted 2026. GCP Professional Data Engineer certification considered depending on the target context.
Anchor reads: *Designing Machine Learning Systems* (Chip Huyen), *Building LLM Powered Applications* (Valentina Alto), curated arXiv papers. Continuous follow of Latent Space, Eugene Yan, Simon Willison. Monthly routine: a new model evaluated on a real case.