AI & NLP

Content Taxonomy & NLP Classification

Two phases: week one designs the taxonomy and validates feasibility; weeks two and three train the classifier and ship the labeling pipeline. Stop after phase one and keep the plan.

Two-phase engagement1 week
Qualification

Is this for you?

  • You have thousands of items with inconsistent or missing categories.
  • Manual tagging is slow, expensive, or impossible at your scale.
  • You need genres, topics, or tags inferred from text descriptions.
  • Search, browse, or recommendations are suffering because labels are messy.
  • You need confidence scoring and a human-review path for edge cases.
Deliverables

What you get

  • Taxonomy audit and label-definition refinement (or creation if missing).
  • Gold-set sampling plan and labeling guidelines.
  • Classifier approach selection and training/adaptation.
  • Evaluation report with precision/recall, thresholds, and failure modes.
  • Labeling pipeline (batch + API interface) with confidence scores.
  • Documentation and handoff, plus a 60-minute walkthrough.
In and out

Scope

What's in

  • Taxonomy design or refinement
  • Text classification modeling and evaluation
  • Batch inference and scoring pipeline
  • Review and QA workflow design

What's out

  • Long-term MLOps monitoring and retraining
  • Full ingestion pipeline rebuilds
  • Editorial UI implementation
  • Manual labeling at scale beyond a small gold set
How it runs

Process

  1. Intake

    Day 1

    Kickoff, taxonomy and data access, success criteria alignment.

  2. Discovery

    Days 2–5

    Taxonomy review, data sampling, baseline classification experiments.

  3. Modeling

    Days 6–10

    Model selection, training or adaptation, and evaluation with thresholds.

  4. Delivery

    Days 11–15

    Pipeline packaging, documentation, and team walkthrough.

How we'd work together

Engagement

Two-phase engagement

Phase 1 designs the taxonomy and validates feasibility in one week, and is a standalone deliverable you can keep. Phase 2 builds the classifier and ships the pipeline once Phase 1 confirms the approach is sound.

Phase 1 — Feasibility & Taxonomy Design
1 week
  • Written taxonomy design document.
  • Feasibility assessment against your actual catalog.
  • Recommended model architecture and approach.
  • Expected accuracy ranges and risk flags.
  • Scope and timeline for Phase 2.
Phase 2 — Classifier Build & Pipeline Deployment
2 weeks
  • Trained classifier tuned to your catalog.
  • Labeling pipeline with confidence scores and review rules.
  • Deployment, documentation, and handoff.

Contingent on Phase 1 sign-off; only engaged once feasibility is confirmed.

Scope may scale for catalogs above 200K items or multi-language classification. Flagged on the discovery call.

Full engagement (Phase 1 + Phase 2)
3 weeks

Phase 1: 50% to start, 50% on delivery. Phase 2 invoiced separately after Phase 1 sign-off. Includes one 30-minute follow-up call within 30 days of final delivery.

Every engagement starts with a discovery call. No prices on the public site — final scope and quote are confirmed once we've talked through your situation.

If the call shows we're not a fit, no engagement, no charge.

FAQ

Questions

Do we need an existing taxonomy?

No. If you have one, we’ll refine it. If not, we’ll design a practical taxonomy with your stakeholders.

What if we don’t have labeled data?

We’ll build a small gold set and bootstrap from there; you don’t need millions of labels to start.

Can this run in our stack?

Yes — deliverable can be a containerized service, batch job, or script integrated into your pipeline.

Is this LLM-based?

Sometimes. We choose the most reliable and cost-effective approach for your data, which isn’t always an LLM.

Who you're working with

About

I’ve delivered taxonomy systems that auto-assign genres and subgenres to tens of thousands of album products, trained on millions of rows of Discogs data. The work is grounded in production-scale classification, not theory.

More about Paper Scissors & Glue

Ready to start?

Book an intro call. If we're not a fit, I'll tell you on the call.

Based in Cluj-Napoca, Romania. Available across EU and US time zones.