DBS Foundation Coding Camp 2024/AI/ML/course

Nongzhanguan Air Quality Analysis

This project focuses on exploratory environmental analysis rather than predictive deployment, using pollutant and weather measurements to surface interpretable patterns that matter for public-health reasoning and monitoring context.

project links
Domain
AI/ML
Role
Machine Learning Engineer
Output
Research/Case Study
Category
Environmental Time-Series EDA
Project Framing

A source-backed case study built for recruiter review

This reading path makes the problem choice, evidence quality, user framing, execution decisions, and proof trail visible without overstating what the sources support.

Project Type
course

Environmental data analysis project that studies pollutant and weather patterns through exploratory analysis and time-series framing.

Orientation
Tech

Shows practical ML-adjacent analytical skill by turning raw environmental measurements into readable findings instead of leaving the project at descriptive plots alone.

Core Stack
Python · Pandas · Matplotlib · Jupyter Notebook

Notebook-based EDA workflow using CSV datasets, preprocessing steps, missing-value handling, and time-series visual analysis for pollution and weather features.

Why This Problem Mattered

Problem framing before execution

The case-study layer starts with why this problem was selected and how the context justified investment.

Problem Framing Map

Issue

Air-quality datasets are hard to interpret without careful missing-value handling, time-series framing, and context that links pollution patterns to monitoring decisions.

Context

The project was built as an exploratory analysis artefact where transparency of preprocessing and interpretation mattered more than forcing predictive claims.

Why Selected

It adds analytical depth to the portfolio by showing how environmental data can be turned into interpretable findings with conservative evidence handling.

Problem statement

Air-quality datasets are hard to interpret without clear time-series framing, careful missing-value handling, and context that connects pollutant signals to real monitoring decisions.

Solution thesis

Built an exploratory analysis workflow that examines pollution and weather measurements over time, documents imputation choices, and translates chart patterns into interpretable environmental insights.

Research and Evidence

What supports the narrative

Evidence is surfaced with its source type and credibility note so the recruiter can quickly see what is directly backed versus intentionally constrained.

Data interpretation framing
local

The project explicitly focuses on pollution and weather measurements over time, not only on static descriptive statistics.

Credibility: Backed by the problem statement, architecture description, and notebook-backed source references.
Preprocessing transparency
local

Missing-value handling is documented as part of the analysis workflow.

Credibility: Supported by the project metrics, responsibilities, and source-backed notebook narrative.

Credibility Notes

  • The project is positioned as transparent exploratory analysis, not as a production forecasting or environmental decision-support platform.
  • No causal or policy-impact claim is added beyond the descriptive and interpretive evidence present in the notebook and README.
Who The User Was

User framing stays explicit

When formal research artefacts are not available, the page still explains who the work served and why that user framing is justified by the existing sources.

Primary user
Reviewers or analysts who need a clear explanation of pollutant patterns, preprocessing choices, and time-series interpretation.

The strongest project value comes from analytical transparency and traceable reasoning rather than an end-user product surface.

Decision stakeholder
Readers who want environmental signals translated into understandable monitoring insights.

The source-backed framing explicitly connects charts and patterns to practical interpretability.

Decision Flow

How design thinking translated into decisions

The goal is to show the trace from research and insight to concrete product or system decisions, then to the outcomes those decisions supported.

Design Thinking Flow

Each step keeps the movement from evidence to action explicit before the rationale expands it.

  1. Step 1
    Interpretation-first framing

    Defined the work around making air-quality data understandable before optimizing for advanced modeling.

    Signal: Time-series reasoning and missing-value handling became central to the workflow.
  2. Step 2
    Transparent preprocessing

    Made imputation and dataset handling part of the narrative instead of hiding them behind final charts.

    Signal: Analytical trust depends on visible preprocessing decisions.
  3. Step 3
    Insight translation

    Converted chart patterns into cautious explanatory findings relevant to environmental interpretation.

    Signal: The project balances data work with readable analytical storytelling.

Decision Rationale

Each decision keeps the path from insight to execution visible before ending on the outcome signal.

Notebook-based transparency
Insight

EDA becomes less trustworthy when preprocessing and chart reasoning are separated from the analytical narrative.

Decision

Kept the analysis notebook-driven and explicit about data handling choices.

Outcome

The project remains easier to audit and discuss as an analysis workflow.

Conservative insight framing
Insight

Weak feature relationships and recurring seasonal patterns can still be valuable findings without implying predictive certainty.

Decision

Framed results as interpretable analytical signals rather than exaggerated model-level conclusions.

Outcome

The case stays credible and source-safe for recruiter review.

Solution and System Execution

Execution choices and delivery details

This section preserves the technical and operational substance: architecture, responsibilities, trade-offs, and implementation quality signals.

System Design

Notebook-based EDA workflow using CSV datasets, preprocessing steps, missing-value handling, and time-series visual analysis for pollution and weather features.

Source-backed Impact

Shows practical ML-adjacent analytical skill by turning raw environmental measurements into readable findings instead of leaving the project at descriptive plots alone.

Responsibilities

  • Prepared environmental dataset for analysis
  • Handled missing values and exploratory preprocessing
  • Analyzed pollutant and weather patterns through time-series visualization

Stack Decisions

  • Used notebook workflow to keep exploration transparent and reviewable
  • Used time-series analysis to emphasize recurring pollution behavior
  • Preserved a simple analytical structure instead of forcing predictive claims

Trade-offs

  • Accepted lower operational maturity in exchange for clearer analytical storytelling
  • Kept scope centered on interpretation instead of extending into unsupported forecasting claims

Challenges

  • Making missing-value treatment understandable in an environmental monitoring context
  • Translating weak correlations and noisy signals into useful analytical conclusions
Outcomes and Proof

What was delivered and what can be verified

Outcome claims remain conservative and source-backed, while proof records and recruiter-safe links surface the strongest verification trail available.

Validation Signals

  • README, notebook, and dataset analysis artefacts are preserved in the local archive.
  • Missing-value handling is documented as part of the analytical workflow.

Source-backed Outcomes

  • Time-series framing used to reveal recurring pollution spikes and seasonality
  • Missing-value handling documented as part of the analytical workflow
Retrospective and Limits

What the project proves, and what it does not

Strong case studies show both what was learned and where the current evidence stops.

Retrospective

Next iteration should add compact numeric findings, setup instructions, and clearer notes on the tradeoffs of forward-fill imputation.

Evidence Limits

  • Current sources do not support production monitoring, forecasting-service deployment, or policy-outcome claims.
  • The project should remain framed as exploratory environmental analysis, not a validated operational decision engine.

Lessons

  • Time-series framing is effective for revealing recurring pollution spikes and seasonality
  • Weak feature correlation can still be a useful analytical finding