May 2026/AI/ML/professional

SST El Niño iForest

This project is structured as a reviewable ML workflow rather than a black-box notebook: input data, model comparisons, and exported outputs are separated so reviewers can audit assumptions and outcomes.

project links
Domain
AI/ML
Role
ML Workflow Builder
Output
ML Pipeline
Category
Climate Anomaly Detection Workflow
Project Framing

A source-backed case study built for recruiter review

This reading path makes the problem choice, evidence quality, user framing, execution decisions, and proof trail visible without overstating what the sources support.

Project Type
professional

Reproducible anomaly-detection workflow using Isolation Forest and comparator models to analyze SST variability as an El Niño proxy signal.

Orientation
Tech

Improves auditability for exploratory climate ML work by preserving clear boundaries between data, analysis notebooks, and generated artefacts.

Core Stack
Python · Jupyter Notebook · scikit-learn · Pandas

Notebook-centered pipeline with explicit dataset input (`data/`), analysis notebooks (`notebooks/`), and reproducible outputs (`results/` + `figures/`).

Why This Problem Mattered

Problem framing before execution

The case-study layer starts with why this problem was selected and how the context justified investment.

Problem Framing Map

Issue

Anomaly detection projects are hard to trust when modelling choices and outputs are not reproducible.

Context

The project intentionally frames SST anomaly detection as a proxy-analysis workflow with explicit comparator methods and exported evidence artefacts.

Why Selected

It offers strong reviewer value by coupling ML experimentation with clean evidence packaging, while keeping interpretation claims conservative.

Problem statement

Climate anomaly analysis loses credibility when preprocessing, model choice, and output artefacts are not reproducible.

Solution thesis

Built a repository with notebook-driven anomaly detection, comparator benchmarks, and exported result tables/figures for transparent review.

Research and Evidence

What supports the narrative

Evidence is surfaced with its source type and credibility note so the recruiter can quickly see what is directly backed versus intentionally constrained.

Method scope evidence
public

README states Isolation Forest as focal method with OneClassSVM and LocalOutlierFactor comparators.

Credibility: Directly documented in repository overview and benchmarking sections.
Open supporting public source
Output artefact evidence
public

Result files and figures are persisted under dedicated folders for independent review.

Credibility: Corroborated by repository tree and benchmark CSV files.
Open supporting public source

Credibility Notes

  • Public copy is restricted to reproducibility, method scope, and documented artefact availability.
  • No operational forecasting-accuracy or climate-policy impact claim is made beyond repository-backed analysis outputs.
Who The User Was

User framing stays explicit

When formal research artefacts are not available, the page still explains who the work served and why that user framing is justified by the existing sources.

Primary user
Reviewers or learners evaluating reproducible unsupervised anomaly-detection workflows.

The repository structure is designed to expose data inputs, notebooks, and generated outputs clearly.

Technical stakeholder
ML practitioners comparing anomaly-model behavior on a consistent proxy dataset.

Benchmark comparators and exported metrics enable side-by-side model review.

Decision Flow

How design thinking translated into decisions

The goal is to show the trace from research and insight to concrete product or system decisions, then to the outcomes those decisions supported.

Design Thinking Flow

Each step keeps the movement from evidence to action explicit before the rationale expands it.

  1. Step 1
    Reproducibility-first packaging

    Organized repository into clear data, notebook, results, and figure boundaries.

    Signal: Auditability is treated as a design objective.
  2. Step 2
    Comparator method framing

    Evaluated multiple unsupervised methods rather than reporting a single model in isolation.

    Signal: Model selection discussion gains context and credibility.
  3. Step 3
    Proxy interpretation control

    Positioned outcomes as proxy-signal analysis instead of broad climate prediction claims.

    Signal: Narrative remains conservative and source-backed.

Decision Rationale

Each decision keeps the path from insight to execution visible before ending on the outcome signal.

Executed notebook inclusion
Insight

Reviewers need immediate visibility into outputs before rerunning heavy analysis.

Decision

Included both executed and rerunnable notebook variants.

Outcome

Project supports both quick inspection and full rerun workflows.

Artefact export discipline
Insight

Notebook-only output becomes hard to compare and validate over time.

Decision

Exported benchmark and diagnostic outputs into dedicated CSV/figure files.

Outcome

Result traceability is stronger for technical review and iteration.

Solution and System Execution

Execution choices and delivery details

This section preserves the technical and operational substance: architecture, responsibilities, trade-offs, and implementation quality signals.

System Design

Notebook-centered pipeline with explicit dataset input (`data/`), analysis notebooks (`notebooks/`), and reproducible outputs (`results/` + `figures/`).

Source-backed Impact

Improves auditability for exploratory climate ML work by preserving clear boundaries between data, analysis notebooks, and generated artefacts.

Responsibilities

  • Structured anomaly-detection workflow for reproducible reviewer access
  • Benchmarked multiple unsupervised methods against the same proxy setup
  • Packaged outputs and figures into a transparent artefact hierarchy

Stack Decisions

  • Used Isolation Forest as focal method while retaining comparator fairness
  • Kept notebook and output directories separate for traceability
  • Included executed notebook to improve fast reviewer onboarding

Trade-offs

  • Accepted notebook-centric execution in exchange for transparent exploratory workflow
  • Prioritized reproducibility artefacts over production pipeline orchestration

Challenges

  • Keeping anomaly-proxy interpretation conservative under limited domain context
  • Balancing comparative model breadth with clear reviewer narrative
Execution Visuals

Architecture and outcome snapshot

This visual layer keeps execution readable: how the system or delivery flow was structured and which source-backed outcomes mattered most.

Execution Flow

  1. Step 1
    Data Conditioning

    SST source data is loaded and transformed into monthly anomaly proxy features.

    Signal: Feature preparation is explicit and reviewable.
  2. Step 2
    Model Benchmarking

    Isolation Forest and comparator models are evaluated under aligned analysis flow.

    Signal: Model behavior is interpreted comparatively, not in isolation.
  3. Step 3
    Artefact Export

    Metrics and diagnostics are exported to CSV and figures for transparent review.

    Signal: Outputs remain reusable beyond notebook runtime.

Outcome Snapshot

  • Comparator Set
    3 unsupervised models

    IsolationForest + OneClassSVM + LocalOutlierFactor

  • Output Surface
    CSV + figure artefacts

    Results and diagnostics stored in dedicated folders

  • Reproducibility
    Executed + clean notebook

    Supports quick audit and rerun pathways

Outcomes and Proof

What was delivered and what can be verified

Outcome claims remain conservative and source-backed, while proof records and recruiter-safe links surface the strongest verification trail available.

Validation Signals

  • README and repository tree expose explicit benchmark and output artefact structure.
  • Dedicated `results/benchmark_metrics.csv` file provides direct model-comparison evidence.

Source-backed Outcomes

  • Comparator benchmark includes IsolationForest, OneClassSVM, and LocalOutlierFactor
  • Result artefacts are exported into dedicated CSV output files
  • Repository preserves both executed and rerunnable notebook variants
Retrospective and Limits

What the project proves, and what it does not

Strong case studies show both what was learned and where the current evidence stops.

Retrospective

Next step should add deeper domain-grounded interpretation notes and explicit uncertainty communication for climate context.

Evidence Limits

  • Current work is a reproducible analysis pipeline, not a production climate-monitoring service.
  • Proxy interpretation requires additional domain validation before operational decision use.

Lessons

  • Exploratory ML becomes stronger when output artefacts are explicit and versioned
  • Benchmark comparators improve interpretability even in notebook-first projects