DBS Foundation Coding Camp 2024/AI/ML/course

Human Stress Prediction Pipeline

This project pushes beyond notebook-only experimentation by turning stress detection into a structured ML pipeline with transform, trainer, tuner, and serving artefacts that make the workflow more operationally reviewable.

project links
Domain
AI/ML
Role
Machine Learning Engineer
Output
ML Pipeline
Category
Text Classification MLOps
Project Framing

A source-backed case study built for recruiter review

This reading path makes the problem choice, evidence quality, user framing, execution decisions, and proof trail visible without overstating what the sources support.

Project Type
course

TFX-based text-classification pipeline for stress detection with training, tuning, serving artefacts, and reproducible workflow structure.

Orientation
Tech

Shows meaningful MLOps progression by treating model lifecycle and deployment readiness as part of the project, not as an afterthought.

Core Stack
TensorFlow · TFX · Docker · Python

Pipeline-based ML workflow with text preprocessing, model training and tuning, exported serving artefacts, and Docker-supported execution environment.

Why This Problem Mattered

Problem framing before execution

The case-study layer starts with why this problem was selected and how the context justified investment.

Problem Framing Map

Issue

Text-classification experiments become hard to reproduce when preprocessing, training, tuning, and serving are not packaged into one consistent workflow.

Context

The project intentionally moves beyond notebook-only experimentation into a TFX-oriented pipeline structure with explicit component boundaries and serving artefacts.

Why Selected

It is a strong portfolio case because it shows lifecycle thinking in applied ML: not just a model, but an operationally reviewable workflow.

Problem statement

Text-classification experiments become hard to reproduce when preprocessing, training, and serving boundaries are not packaged into one consistent workflow.

Solution thesis

Built a TFX-oriented machine-learning pipeline for stress prediction that separates data handling, training, tuning, and serving artefacts into a repeatable structure.

Research and Evidence

What supports the narrative

Evidence is surfaced with its source type and credibility note so the recruiter can quickly see what is directly backed versus intentionally constrained.

Workflow diagnosis
local

The core problem is reproducibility drift across preprocessing, training, and serving boundaries.

Credibility: Directly grounded in the project problem, intro, and architecture descriptions.
Operational artefacts
local

Pipeline components, serving artefacts, and Docker-supported execution are all preserved as evidence.

Credibility: Supported by metrics, responsibilities, and stack decisions in the source-backed project record.

Credibility Notes

  • The project demonstrates MLOps progression through pipeline structure and artefacts, not through production user metrics.
  • No claim is made about live serving traffic, model drift monitoring, or business impact beyond the documented workflow.
Who The User Was

User framing stays explicit

When formal research artefacts are not available, the page still explains who the work served and why that user framing is justified by the existing sources.

Primary user
ML practitioners who need a repeatable text-classification workflow instead of isolated notebook experimentation.

The solution emphasizes reproducibility, modularity, and handoff quality across the model lifecycle.

Operational stakeholder
Reviewers assessing whether the model can move from experimentation toward deployment-ready packaging.

Serving artefacts and Docker support indicate an explicit concern for downstream operational use.

Decision Flow

How design thinking translated into decisions

The goal is to show the trace from research and insight to concrete product or system decisions, then to the outcomes those decisions supported.

Design Thinking Flow

Each step keeps the movement from evidence to action explicit before the rationale expands it.

  1. Step 1
    Reproducibility framing

    Defined the project around workflow consistency rather than model score alone.

    Signal: Pipeline structure became the product of interest.
  2. Step 2
    Boundary design

    Separated transform, trainer, tuner, and serving artefacts to clarify lifecycle responsibilities.

    Signal: TFX-style components anchor the implementation.
  3. Step 3
    Operational packaging

    Added Docker-supported execution and versioned artefacts to improve handoff clarity.

    Signal: The repository becomes more reviewable than a notebook-only setup.

Decision Rationale

Each decision keeps the path from insight to execution visible before ending on the outcome signal.

TFX-style pipeline
Insight

Text preprocessing and serving mismatches often emerge when each stage is handled informally.

Decision

Used TFX-oriented pipeline boundaries for transform, training, and tuning.

Outcome

The project demonstrates lifecycle structure, not only experimentation output.

Docker support
Insight

Heavier ML workflows become harder to reproduce across environments.

Decision

Used Docker to reduce execution drift during setup and review.

Outcome

The runtime story is more portable and audit-friendly.

Solution and System Execution

Execution choices and delivery details

This section preserves the technical and operational substance: architecture, responsibilities, trade-offs, and implementation quality signals.

System Design

Pipeline-based ML workflow with text preprocessing, model training and tuning, exported serving artefacts, and Docker-supported execution environment.

Source-backed Impact

Shows meaningful MLOps progression by treating model lifecycle and deployment readiness as part of the project, not as an afterthought.

Responsibilities

  • Implemented text preprocessing and pipeline components
  • Built model training and tuning workflow
  • Prepared serving artefacts and reproducible pipeline structure

Stack Decisions

  • Used TFX-style pipeline boundaries to improve reproducibility
  • Versioned serving artefacts to support clearer deployment handoff
  • Used Docker to reduce environment drift during execution

Trade-offs

  • Accepted larger repository complexity to gain lifecycle clarity
  • Prioritized pipeline structure over minimal notebook simplicity

Challenges

  • Handling text data consistently across training and serving boundaries
  • Keeping pipeline artefacts reviewable despite a heavier project footprint
Outcomes and Proof

What was delivered and what can be verified

Outcome claims remain conservative and source-backed, while proof records and recruiter-safe links surface the strongest verification trail available.

Validation Signals

  • Pipeline components and serving artefacts are preserved in the repository.
  • Operational screenshots and build outputs are described as part of the project evidence.

Source-backed Outcomes

  • Pipeline components and serving artefacts preserved in the repository
  • Operational screenshots and build outputs improve external auditability
Retrospective and Limits

What the project proves, and what it does not

Strong case studies show both what was learned and where the current evidence stops.

Retrospective

Next iteration should add automated evaluation summary, model-card notes, and stronger test coverage around pipeline components.

Evidence Limits

  • Current sources do not include production monitoring, live endpoint usage, or model-drift governance evidence.
  • Evaluation strength should remain conservative until richer automated metrics are surfaced in public-friendly form.

Lessons

  • Pipeline modularity improves iteration speed for applied ML systems
  • Operational artefacts make ML work easier to audit than notebook results alone