---
title: "Data, AI & Machine Learning - José DA COSTA"
description: "Data, AI, and machine learning is, in my own definition, the competency that **turns events and texts into decisions**. It covers relational and NoSQL databases, data engineering, ML fundamentals, and"
locale: "en"
canonical: "https://portfolio.josedacosta.info/en/skills/data-ai-machine-learning"
source: "https://portfolio.josedacosta.info/en/skills/data-ai-machine-learning.md"
html_source: "https://portfolio.josedacosta.info/en/skills/data-ai-machine-learning"
author: "José DA COSTA"
type: "skill"
slug: "data-ai-machine-learning"
generated_at: "2026-04-26T21:12:47.538Z"
---

# Data, AI & Machine Learning

Icon: 🤖

## My definition

Data, AI, and machine learning is, in my own definition, the competency that **turns events and texts into decisions**. It covers relational and NoSQL databases, data engineering, ML fundamentals, and **applied LLM workflows** (RAG, agentic, evaluation). It is the explicit strategic axis of my 2026-2028 project: **integrate generative AI in compliance-aware workflows** and operate data at the scale of a regulated vertical B2B SaaS.

### Context

I work on **3 layers** I hold in parallel. **Storage and modelling**: advanced SQL, Prisma modelling (~91 models accounting SaaS, 98 broker SaaS), MongoDB and PostgreSQL in production at **several hundred GB of RAM** at ACCENSEO. **Pipelines**: custom ETL (Akeneo Ligneurs), Azure ML Studio ML pipelines (AdsPower 2016-2018), multi-vendor enrichment (Claude, GPT, Gemini, TRELLIS, TripoSR, Shap-E). **Applied AI**: hands-on RAG in the ACCENSEO pipeline, classification, 3D generation, multilingual translation, attribute extraction from visuals. Skill **actively levelling toward Senior** on the data engineering + applied ML + LLM-Ops triptych.

### Relevance

In 2026, the competitive moat of a vertical B2B SaaS is **no longer in the chosen LLM** but in the **context you give it**, proprietary permissioned data, real task execution with guardrails, and embedded distribution. This is the thesis Microsoft Azure develops in [10 RAG Shifts Redefining Production AI in 2026](https://medium.com/microsoftazure/10-rag-shifts-redefining-production-ai-in-2026-7acbdd66076c): **agentic RAG** is now the default pattern for answering complex questions and executing actions, and **hybrid RAG** is the production baseline. The CTO who knows how to design an industrialised RAG pipeline (eval + drift detection + cost per feature) on a regulated domain becomes sought after.

## My evidence

### Co-founding AdsPower around AdTech ML pipelines

**Context:** In January 2016, I co-founded **AdsPower** as **CTO and Technical Project Manager** of an **early-stage bootstrapped startup**. The bet: compete with Optmyzr (US) and Dolead (FR) using an **ML-first** approach to automatically optimise bids on Google AdWords, Bing Ads and Facebook Ads. The market was dominated by heuristic-based recommendation engines, and **Azure ML Studio** had just left public preview - the window was real, but so was the challenge: **scarcity of ML skills in Bordeaux in 2017** and a **limited runway**.

**Action:** I built a complete ML pipeline: a **Data Collection Service** wired to Google AdWords + Bing Ads + Facebook Ads SDKs, a custom **SERP Scraper** (Goutte + CasperJS) covering **6 search engines** (Google, Bing, Yahoo, Yandex, Baidu, DuckDuckGo) and absorbing **more than 10 million requests per month** through a Memcached cache + Redis queue, and a **Python Flask sidecar** running **NLTK + TF-IDF** for multilingual NLP. On the modelling side, I trained supervised classifiers on **Azure ML Studio** for **bid prediction**, **k-means** clusters for negative-keyword detection, and the **Google Prediction API** for audience segmentation. The application stack: **Symfony 3.2 + Angular** with **Electron** desktop builds (Mac/Windows/Linux) and **Cordova** mobile (iOS/Android). To source ML freelancers, I ran **geo-targeted GitHub searches** on the machine-learning tags.

**Result:** **3 major product iterations** shipped in less than a year with a **team of 4 freelancers** I steered as Technical Project Manager, the platform covering **3 ad networks** (Google, Bing, Facebook) with sub-500 ms recommendation latency, and **3 active beta testers** on the v1 in November 2016.

**Value added:** That venture taught me viscerally that **classification + bid optimisation can be productised** - not just demoed in a notebook. The reflexes I forged there (sub-second latency, heuristic fallback when model confidence is low, quality-score monitoring) are the very ones I now replay on the ACCENSEO LLM workflows. AdsPower never reached PMF before the runway ran out, but it was my first real **production ML** school.

### Industrialising multi-vendor LLM enrichment at ACCENSEO

**Context:** At ACCENSEO, one of the recurring themes with my e-commerce and PIM customers is **massive AI-driven product enrichment**: **tens of thousands of product sheets** to optimise - automatic taxonomy, SEO rewriting, image processing (background removal, watermarking), **3D model generation**, multi-language translation, attribute extraction from visuals. The trap: locking yourself onto a single LLM vendor means inheriting its outages, pricing, and rate limits.

**Action:** I built a **multi-vendor pipeline** by default. On the text side, I integrated **OpenAI GPT, Anthropic Claude and Google Gemini** behind a router that picks the model per task (Claude for precision, GPT for creativity, Gemini for lightweight multimodal). On the 3D side, I wired in **TRELLIS**, **TripoSR**, and **Shap-E** to generate 3D models from product photos. On the image side, automated background processing, cut-out, and watermarking. Orchestration runs through **n8n** and **Make.com** for automated workflows, **Power Automate** for Microsoft triggers, and the whole thing runs on **dedicated OVH servers** to keep customer catalogue data confidential.

**Result:** Enrichment deployed at scale across the e-commerce platforms of several customers (real estate, fashion, viticulture, automotive, fitted kitchens), measurable **catalogue quality lift** without a linear human cost - and an internal product **Addly** derived from this expertise for Confluence/Atlassian Forge.

**Value added:** On this work I understood that production generative AI is won on **observability discipline** (token cost, latency, detected hallucination rate) and on **multi-vendor strategy**, not on prompt sophistication. That is the angle I want to push on the next CTO scale-up role: **turn AI into a moat**, not into a demo gimmick.

### Akeneo to portal real-estate ETL pipeline (Ligneurs)

**Context:** For **4 years** at Pichet (2019-2023), I was the **sole technical owner** of the Ligneurs export pipeline - the **automated syndication engine** for the group's real-estate listings, feeding around 20 partner portals (SeLoger, LeBonCoin, BienIci, LogicImmo...). The system fed an estimated volume of **one lead every 2 seconds** across all portals. Any interruption translated directly into **lost leads** and missed revenue.

**Action:** I designed a **per-partner modular architecture** rather than a generic engine: one **isolated Docker container per portal**, orchestrated by **Kubernetes on AWS EKS**, with **GitLab CI** for targeted deployments that did not impact the other flows. On the ETL side, the pipeline extracts from the **Akeneo PIM v2 REST API**, transforms to each portal's specific format (XML, CSV, JSON), pre-renders **multi-format images** (4/3, 16/9, panoramic, square) centrally to avoid per-partner reprocessing, and ships via automated **FTP/SFTP**. I added **defensive patterns** on heterogeneous sources: **circuit breaker** on the PIM API, **retry logic** on FTP uploads, **SKU matching algorithm** between manual programs and ERP programs. The **v1.4 to v2 migration** was done **portal by portal** with business validation at every step, never big-bang.

**Result:** **Zero-downtime migration** across every partner portal, centralised monitoring with automated email alerts, and the pipeline ran in **continuous operation for 4 years** without a major listing loss - no equivalent in the department was running with that level of reliability.

**Value added:** That project raised the **data engineering bar** I now carry on every ACCENSEO engagement: per-partner isolation, batch processing where real-time streaming brings nothing, observability per flow from day one. It is also where I durably understood **data architectural debt**: a generic single module looks easy at write time but becomes unmanageable at the tenth partner integration.

## My self-critique

### Mastery level

Level **Confirmed actively levelling toward Senior**. Foundations are solid: advanced SQL, Prisma modelling (~91 models accounting SaaS, 98 broker SaaS), MongoDB and PostgreSQL in production over hundreds of GB at [ACCENSEO](/en/journey/cto-founder-directeur-technique-accenseo), Azure ML Studio ML pipelines (AdsPower), and multi-vendor applied LLM workflows (Claude, GPT, Gemini, TRELLIS, TripoSR, Shap-E). What still needs strengthening: **industrialised RAG** with eval and guardrails, production-grade MLOps (versioning, drift detection), and very large-scale data engineering (>TB).

### Importance in my profile

**Explicit strategic axis** of my 2026-2027 project. It stitches three layers. data foundations (rapid schema reading, pipeline audit), applied ML (classification, scoring, recommendation), and generative AI in production (RAG, agents, eval). For a vertical B2B SaaS scale-up CTO role, it is what turns AI into a *moat* rather than a demo gimmick.

### Acquisition speed

Deliberate **Confirmed → Senior** climb triggered late 2024 and still ongoing: hands-on RAG plugged into the ACCENSEO pipeline, **multi-vendor** (Claude + GPT + Gemini), AI enrichment of tens of thousands of product sheets. Cadence is measurable quarter by quarter.

### Advice (for myself and others)

To myself: ship **one small RAG or agentic project per quarter**, with explicit eval, to keep the competency sharp, and maintain a journal of prompts that work and that don't. To others: *do not confuse AI demo with AI production*, invest from day one in pipeline observability (token cost, latency, detected hallucination rate) and in guardrails (sanitisation, rate limit, human fallback). Pick a data-first stack before the model stack.

## My evolution in this skill

### Role in my professional project

### The 2026-2028 strategic axis

Data and AI are **the axis that distinguishes my CTO profile in 2026**. In the 24-month plan, they let me frame an AI-augmented vertical B2B SaaS product, hire a coherent data + ML / LLM team, and defend an AI product trajectory in front of a board separating *moat* from *commodity*. Without that axis, the 2026-2028 CTO role boils down to a modern-stack operator role.

### Mid-term target level

By end of 2027, the observable goal is to **operate a production-grade data + AI platform** with industrialized RAG pipeline (eval + drift detection), explicit cost per AI feature and quarterly quality review. The Confirmed-to-Senior shift is measured on the triple mastery data engineering + applied ML + LLM-Ops, not on an abstract score.

### Current training

Hands-on RAG integrated into the ACCENSEO pipeline (Claude + GPT + Gemini multi-vendor, TRELLIS / TripoSR / Shap-E for 3D generation), weekly intake of LLM releases (Anthropic, OpenAI, Mistral, DeepSeek). Master in Software Engineering active until 2026.

### Future training

[DeepLearning.AI](https://www.deeplearning.ai/) Specialization and Coursera MLOps programs planned 2026-2027. Maven *Applied LLM* cohort (Hamel Husain for example) targeted 2026. GCP Professional Data Engineer certification considered depending on the target context.

## Progression across journey

This skill was developed across 12 different journey items.

- **1999** - [CTO · Founder · technical director](https://portfolio.josedacosta.info/en/journey/celiane-founder.md) (entrepreneurship) - Confidence: 2/5
- **2001** - [BTS IG (IT Management)](https://portfolio.josedacosta.info/en/journey/bts-computer-science.md) (education) - Confidence: 2/5
- **2008** - [Junior Software Engineer · PHP Joomla Webmaster Developer](https://portfolio.josedacosta.info/en/journey/ministere-sante-webmaster.md) (experience) - Confidence: 2/5
- **2009** - [Software Engineer · PHP Zend Framework Developer](https://portfolio.josedacosta.info/en/journey/european-sourcing-engineer.md) (experience) - Confidence: 5/5
- **2013** - [Senior Software Engineer · Lead PHP Symfony Developer](https://portfolio.josedacosta.info/en/journey/medialeads-senior-engineer.md) (experience) - Confidence: 4/5
- **2016** - [Technical Project Manager · Co-founder · Early-Stage Startup](https://portfolio.josedacosta.info/en/journey/adspower-cofounder.md) (entrepreneurship) - Confidence: 5/5
- **2017** - [Senior Software Engineer · Lead PHP Magento Developer](https://portfolio.josedacosta.info/en/journey/smile-senior-engineer.md) (experience) - Confidence: 4/5
- **2019** - [Engineering Manager · Project Manager / Product Owner · Technical Lead](https://portfolio.josedacosta.info/en/journey/pichet-group.md) (experience) - Confidence: 5/5
- **2019** - [Technical Lead · Flows and Products: content and enterprise integration](https://portfolio.josedacosta.info/en/journey/pichet-technical-lead.md) (experience) - Confidence: 4/5
- **2020** - [Entrepreneur · Various Business Domains](https://portfolio.josedacosta.info/en/journey/auto-entrepreneur-jdc.md) (entrepreneurship) - Confidence: 4/5
- **2023** - [Master Expert in Software Engineering](https://portfolio.josedacosta.info/en/journey/master-software-engineering.md) (education) - Confidence: 4/5
- **2024** - [CTO · Founder · technical director](https://portfolio.josedacosta.info/en/journey/accenseo-founder.md) (entrepreneurship) - Confidence: 5/5

## Related achievements

- [Multi-Supplier Product Data Import System](https://portfolio.josedacosta.info/en/achievements/import-european-sourcing.md) - Designed and operated multi-format ETL system (CSV/XML/FTP) with denormalization for search performance. Managed MySQL front/back architecture with ProxySQL, replication, and multilingual denormalized tables
- [Intelligent Accounting SaaS Platform](https://portfolio.josedacosta.info/en/achievements/plateforme-comptabilite-saas.md) - 91 Prisma models with complex relational schema: chart of accounts, journals, entries, bank reconciliation, FEC
- [European B2B Search Engine for Promotional Products (European Sourcing)](https://portfolio.josedacosta.info/en/achievements/moteur-de-recherche-europeen-b2b-objets-publicitaires.md) - Designed and optimized a 97-table MySQL schema with master-slave replication - SQL normal forms (1NF/2NF/3NF/BCNF), advanced indexing (B-tree, composite, covering), constant EXPLAIN plan analysis and progression to PostgreSQL full-text (tsvector/GIN) then Elasticsearch
- [EuropeanTool - B2B Promotional Product Platform](https://portfolio.josedacosta.info/en/achievements/europeantool-plateforme-b2b.md) - Managed 15 GB MySQL database with 50+ tables, complex product catalog schemas, and export optimization
- [B2B Product Data Export Platform](https://portfolio.josedacosta.info/en/achievements/export-donnees-produits-b2b.md) - Designed MySQL schema for export management with deadlock prevention, concurrent access control, and complex multi-table queries
- [Food Truck & Mobile Concept Platform - French manufacturer (alias MCR)](https://portfolio.josedacosta.info/en/achievements/plateforme-food-truck-concepts-mobiles.md) - Shaped a 133-table PostgreSQL schema through Payload CMS collections and Drizzle ORM, including 46 versioning tables
- [Centralized Multilingual Translation Management Platform](https://portfolio.josedacosta.info/en/achievements/plateforme-gestion-traductions-multilingues.md) - Queried 7 MySQL translation tables via Doctrine DBAL with search, pagination and validation tracking
- [E-Commerce Platform Redesign Magento Enterprise Edition (alias Fleurance Nature)](https://portfolio.josedacosta.info/en/achievements/refonte-ecommerce-magento-fleurancenature.md) - Magento EAV schema (6+ tables per product), MySQL, 4 customer groups x 3 websites pricing matrix (12 combinations with catalog + cart rules)
- [ETL Pipeline for Real Estate Listing Syndication (alias Ligneurs)](https://portfolio.josedacosta.info/en/achievements/pipeline-etl-syndication-immobiliere.md) - End-to-end ETL pipeline from PIM Akeneo to several dozen portals: extraction, multi-format transformation (XML/CSV/JSON), FTP/SFTP delivery, monitoring
- [PIM Extranet for B2B Promotional Products Search Engine (European Sourcing)](https://portfolio.josedacosta.info/en/achievements/extranet-pim-b2b-objets-publicitaires.md) - MySQL then PostgreSQL with Doctrine, 6-step CSV import pipeline, 37 supplier connectors via FTP/HTTP/REST
- [E-Commerce Site Generator with Customization CMS (alias MyEasyWeb)](https://portfolio.josedacosta.info/en/achievements/generateur-sites-ecommerce-avec-cms.md) - Managed 67 database entities with MySQL and Doctrine ORM across multi-tenant architecture

Interactive version with navigation: https://portfolio.josedacosta.info/en/skills/data-ai-machine-learning
