Qualification

Is this for you?

You have thousands of items with inconsistent or missing categories.
Manual tagging is slow, expensive, or impossible at your scale.
You need genres, topics, or tags inferred from text descriptions.
Search, browse, or recommendations are suffering because labels are messy.
You need confidence scoring and a human-review path for edge cases.

Deliverables

What you get

Taxonomy audit and label-definition refinement (or creation if missing).
Gold-set sampling plan and labeling guidelines.
Classifier approach selection and training/adaptation.
Evaluation report with precision/recall, thresholds, and failure modes.
Labeling pipeline (batch + API interface) with confidence scores.
Documentation and handoff, plus a 60-minute walkthrough.

In and out

Scope

What's in

Taxonomy design or refinement
Text classification modeling and evaluation
Batch inference and scoring pipeline
Review and QA workflow design

What's out

Long-term MLOps monitoring and retraining
Full ingestion pipeline rebuilds
Editorial UI implementation
Manual labeling at scale beyond a small gold set

How it runs

Process

Intake
Day 1
Kickoff, taxonomy and data access, success criteria alignment.
Discovery
Days 2–5
Taxonomy review, data sampling, baseline classification experiments.
Modeling
Days 6–10
Model selection, training or adaptation, and evaluation with thresholds.
Delivery
Days 11–15
Pipeline packaging, documentation, and team walkthrough.

How we'd work together

Engagement

Two-phase engagement

Phase 1 designs the taxonomy and validates feasibility in one week, and is a standalone deliverable you can keep. Phase 2 builds the classifier and ships the pipeline once Phase 1 confirms the approach is sound.

Phase 1 — Feasibility & Taxonomy Design

1 week

Written taxonomy design document.
Feasibility assessment against your actual catalog.
Recommended model architecture and approach.
Expected accuracy ranges and risk flags.
Scope and timeline for Phase 2.

Phase 2 — Classifier Build & Pipeline Deployment

2 weeks

Trained classifier tuned to your catalog.
Labeling pipeline with confidence scores and review rules.
Deployment, documentation, and handoff.

Contingent on Phase 1 sign-off; only engaged once feasibility is confirmed.

Scope may scale for catalogs above 200K items or multi-language classification. Flagged on the discovery call.

Full engagement (Phase 1 + Phase 2)

3 weeks

Phase 1: 50% to start, 50% on delivery. Phase 2 invoiced separately after Phase 1 sign-off. Includes one 30-minute follow-up call within 30 days of final delivery.

Book a discovery call

Every engagement starts with a discovery call. No prices on the public site — final scope and quote are confirmed once we've talked through your situation.

If the call shows we're not a fit, no engagement, no charge.

FAQ

Questions

Do we need an existing taxonomy?

No. If you have one, we’ll refine it. If not, we’ll design a practical taxonomy with your stakeholders.

What if we don’t have labeled data?

We’ll build a small gold set and bootstrap from there; you don’t need millions of labels to start.

Can this run in our stack?

Yes — deliverable can be a containerized service, batch job, or script integrated into your pipeline.

Is this LLM-based?

Sometimes. We choose the most reliable and cost-effective approach for your data, which isn’t always an LLM.

Who you're working with

About

I’ve delivered taxonomy systems that auto-assign genres and subgenres to tens of thousands of album products, trained on millions of rows of Discogs data. The work is grounded in production-scale classification, not theory.

More about Paper Scissors & Glue

Ready to start?

Book an intro call. If we're not a fit, I'll tell you on the call.

Book an intro call Not sure? Book a free 20-min call

Content Taxonomy & NLP Classification