Back to Home
/

IT Project Plan

Academic project for IT Projects course — Chernivtsi National University

50%
Faster Detection
🎯
95%
Model Accuracy
🚀
Zero
Downtime Deploy

🔍 Fraud Detection System

Project Proposal submitted to Mr. Marku
Reference: IT Projects Course Assignment

🎯 Use Case: Retrain Fraud Model on New Patterns

Aim: To reduce the time required to update the system with new data, ensuring high accuracy for legitimate users while minimizing downtime.

Overview: This process involves retraining the classifier to identify fresh fraudulent transaction types that the current production model misses. The automation of this process significantly reduces financial losses by cutting down the delay between spotting new fraud patterns and deploying updated models.

👥 Actors

  • Project Manager — Accountable for approving and deploying models to production
  • ML Engineer — Responsible for training the Challenger model and comparing it vs Production
  • MLOps Engineer — Manages model registration, deployment, performance monitoring, and rollbacks
  • Fraud Analyst (Subject Matter Expert) — Reviews and labels fraud cases, and conducts bias/fairness audits
  • Tech Lead — Responsible for triggering the retrain & evaluate pipeline
  • Product Owner — Accountable for the final performance report and business value
  • Data Engineer — Pulls the latest datasets from the Feature Store
  • QA Engineer — Generates performance reports and validates model quality
  • Automated Pipeline (System) — Orchestrates training, evaluation, and deployment processes

📋 Pre-conditions

  • New labeled data: The Fraud Analyst has already tagged recent suspicious transactions as "confirmed fraud"
  • Dataset availability: This updated data is ready in the Feature Store

🔄 Scenario

  1. First, the Fraud Analyst finishes reviewing the fraud cases missed yesterday.
  2. The MLOps Engineer then triggers the "Retrain & Evaluate" pipeline.
  3. The System pulls the latest dataset from the Feature Store.
  4. A "Challenger" model is trained by the System to recognize these new patterns.
  5. The System compares this Challenger model against the active Production model (checking metrics like Recall and False Positive Rate).
  6. A report is generated showing that the new model detects the fraud without flagging legitimate users.
  7. Finally, the System registers the Challenger model as a release candidate.

Post-conditions

A new, more accurate model version is registered and staged, ready for zero-downtime deployment. The system maintains high availability while incorporating improved fraud detection capabilities.

🧪 Unit Testing Strategy

The unit tests verifying this scenario will be implemented in the next development phase. The full source code, including tests for model comparison and registration logic, will be hosted on GitHub. The repository link will be shared for code review once implementation is finalized.

Planned test coverage includes:

  • Data pipeline validation and Feature Store integration
  • Model training and evaluation metrics verification
  • Challenger vs Production model comparison logic
  • Model registry and versioning functionality
  • Deployment rollback mechanisms

Yours sincerely,
Andrii Vlonha

Retrain Pipeline — Flow Diagram

Visual representation of the automated fraud model retraining and deployment pipeline.

flowchart TD A["🕵️ Fraud Analyst\nReviews & Labels Cases"] -->|"Confirmed fraud data"| B["🗄️ Feature Store\nUpdated with New Labels"] B --> C["⚙️ MLOps Engineer\nTriggers Retrain Pipeline"] C --> D["📥 System Pulls\nLatest Dataset"] D --> E["🤖 Train Challenger\nModel"] E --> F{"📊 Compare Metrics\nChallenger vs Production\nRecall & FPR"} F -->|"✅ Challenger Better"| G["📋 Register as\nRelease Candidate\n(MLflow)"] F -->|"❌ Challenger Worse"| H["⚠️ Keep Production\nModel & Alert Team"] G --> I["🚀 Zero-Downtime\nDeploy to Production\n(Canary / Blue-Green)"] I --> J["📡 Monitor Production\nGrafana Dashboard"] J -->|"🔁 Drift Detected"| A H -->|"Next Cycle"| A style A fill:#e0f2fe,stroke:#0284c7,color:#0c4a6e style B fill:#f0fdf4,stroke:#16a34a,color:#14532d style C fill:#fef9c3,stroke:#ca8a04,color:#713f12 style D fill:#f0fdf4,stroke:#16a34a,color:#14532d style E fill:#ede9fe,stroke:#7c3aed,color:#3b0764 style F fill:#fff7ed,stroke:#ea580c,color:#431407 style G fill:#d1fae5,stroke:#059669,color:#064e3b style H fill:#fee2e2,stroke:#dc2626,color:#7f1d1d style I fill:#dbeafe,stroke:#2563eb,color:#1e3a5f style J fill:#f3e8ff,stroke:#9333ea,color:#3b0764

Project Kanban Board

Current sprint status for the Fraud Detection System — Retrain Pipeline & Production Rollout.

To Do

5
Deploy to Production (Zero-Downtime)
MLOpsDevOps
Set up Grafana Monitoring Dashboard
MLOps
Write Incident Runbooks
PM
A/B Shadow Testing in Live Traffic
QAMLOps
Conduct Bias & Fairness Audit
QAData

Doing

3
Train Challenger Model on New Fraud Patterns
MLOpsData
Compare Challenger vs Production (Recall & FPR)
MLOpsQA
Generate Performance Evaluation Report
QAPM

Done

6
Kickoff Meeting Held
PM
Fraud Analyst Labeled New Fraud Cases
Data
Feature Store Updated with Labeled Data
Data
MLflow Model Registry Configured
MLOps
Retrain Pipeline Trigger Implemented
MLOpsDevOps
Stakeholder RACI Matrix Defined
PM

Key Elements of IT Project Planning — Applied to Fraud Detection System

Each element below is tailored to the Fraud Detection System project (retrain pipeline & production rollout).

Element Owner Purpose Outputs IT/Fraud Examples/Notes Category
Identify & Analyze Stakeholders Project Manager Map everyone who influences or is impacted by the project to ensure proper engagement and avoid surprises.
  • Stakeholder register + RACI / power-interest grid
  • Fraud Analysts, Compliance (GDPR), Product Owner, Legal, Security, MLOps/DevOps, End-users
Foundation
Define Roles, Responsibilities & RACI PM + Tech Lead Eliminate confusion — clearly define who owns what to streamline collaboration in fast-paced IT environments.
  • RACI matrix, access rights matrix, escalation paths for issues
  • MLOps Engineer (Responsible for ML pipeline); Fraud Analyst (Accountable for labeling); Tech Lead (Consulted)
Foundation
Hold Kickoff Meeting Project Manager Align team on vision, scope, and processes to kickstart execution.
  • Kickoff deck, shared success criteria, initial risks log
  • Demo current false positives/negatives, agree on labeling standards for new fraud patterns
Launch
Define Scope, Budget & Timeline PM + PO Set firm boundaries to manage expectations and prevent overruns.
  • Scope statement, budget breakdown, high-level roadmap
  • Scope: Retrain classifier, evaluate models; Budget: $500/mo GPU; Constraints: latency <50ms
Core
Deliverables & Acceptance Criteria PO + Tech Lead Make success tangible by specifying outputs and how to verify them.
  • Deliverables list with Definition of Done (DoD)
  • Pipeline code in Git; MLflow models; Grafana dashboard; Tests pass & zero-downtime verified
Core
Create Schedule & Milestones Project Manager Break down work into actionable steps with timelines.
  • Gantt chart / Kanban board, sprint plan, milestones
  • W1: Analysts label cases; W2: MLOps triggers retrain; Milestone: Challenger model registered
Execution
Plan Resources & Team Capacity PM + Tech Lead Ensure availability of resources to avoid bottlenecks.
  • Resource histogram, tooling list
  • 1 MLOps Eng, 2 Fraud Analysts; Tools: MLflow, Airflow, Kubernetes; Reserve 4 GPUs
Execution
Risk Assessment & Mitigation PM + Security Identify and mitigate threats early to protect project outcomes.
  • Risk register, mitigation / contingency plans
  • Data drift (monitor); Labeling errors (SME review); Model bias (audits); Contingency: Rollback
Control
Quality & Success Metrics Tech Lead + QA Establish benchmarks to ensure the system meets high standards.
  • KPI dashboard, test strategy (unit, integration, A/B)
  • Precision 0.95, Recall 0.92, FPR <1%; A/B tests: Shadow mode; Success: 50% faster detection
Control
Communication Plan Project Manager Maintain transparency and quick issue resolution.
  • Comms matrix, real-time dashboards, escalation paths
  • Daily standups; Weekly reports; Slack alerts for pipeline failures; Dashboards in Grafana
Control

RACI Matrix for Fraud Detection System

Roles: R Responsible | A Accountable | C Consulted | I Informed
Task Project Manager ML Engineer MLOps Engineer Fraud Analyst Tech Lead Product Owner Data Engineer QA Engineer
Review and label fraud cases I C I R A C C C I
Trigger retrain & evaluate pipeline C R R A I C I I C
Pull latest dataset from Feature Store I R C I I I R A
Train Challenger model I R A C C C I I I
Compare Challenger vs Production model C R C C A C I R
Generate performance report C C C C C A I R
Register Challenger model I R R A I C I C
Approve and deploy to production C C R I R A A I R
Monitor production performance C C R A R C C I C
Handle deployment rollbacks if needed I C R A I R C I R

Project Priorities (Iron Triangle)

Primary Driver: Quality

In ML Fraud Detection, false negatives mean lost money, and false positives block real users. Scope & Quality are non-negotiable for a passing grade and business value.

Secondary Constraint: Deadline

The project is bound by the university academic calendar. The defense date is fixed, meaning timeline extensions are impossible.

SCOPE 45% TIME 35% COST 20%

Scope Actions (45%)

  • Train Challenger model with 95%+ Precision/Recall.
  • Build automated MLflow Retrain Pipeline.
  • Implement Zero-Downtime Blue/Green deploy.

Time Actions (35%)

  • Strict 4-Sprint lifecycle (2 weeks each).
  • Deliver Core Pipeline MVP by Sprint 2.
  • Final freeze 1 week before presentation.

Cost Actions (20%)

  • Cap AWS/GCP usage at $500/month.
  • Use Spot Instances for model training.
  • Utilize open-source tools (Grafana, MLflow).

8. Risk assessment template

Company name: IT Projects Dept
Date of next review: Sprint 2 End
What are the hazards? Who might be harmed and how? What are you already doing to control the risks? What further action do you need to take to control the risks? Who needs to carry out the action? When is the action needed by? Status
Customer's insolvency (Funding falls through) Development Team & Agency: Loss of expected revenue, unpaid working hours, and abrupt project cancellation. We hold regular monthly syncs with the client to assess their business health and project satisfaction. Require a 30% upfront advance payment before commencing the next project phase; pause execution if invoices are >15 days late. Project Manager / Finance Project start Done
Data Drift degrading model accuracy Business: Missed fraudulent transactions leading to direct financial loss.
Users: Increased false positives.
Data Scientists manually evaluate batch transaction data from the previous week to check for statistical deviations. Implement automated concept drift detection (e.g., evidentlyAI) within the MLflow pipeline to trigger auto-retraining alerts. MLOps Engineer Sprint 2 In Process
Cloud Compute (GPU) Budget Overrun Company Financials: Exceeding the strict $500/month budget reduces overall project profitability. Basic AWS/GCP billing alerts are configured to trigger emails at 80% and 100% of the budget threshold. Transition training workloads exclusively to Spot Instances and enforce strict auto-shutdown policies for idle GPU servers. DevOps Engineer Sprint 1 Done
Critical Spike in False Positive Rate (>1%) Legitimate Customers: Payment rejections, account lockouts, and severe UX degradation leading to churn. Evaluating the Challenger model using standard train/test split metrics on historical static datasets. Setup Grafana real-time alerts for live FPR metrics and mandate a Shadow A/B testing phase before full traffic routing. QA / ML Engineer Sprint 3 In Process
Production Downtime during deployment E-commerce Platforms & Users: Unable to process real-time checkouts during the API outage window. Manual deployments are scheduled exclusively during low-traffic night hours (3:00 AM) with manual rollback plans. Architect and test a Kubernetes-based Blue-Green deployment strategy ensuring 100% zero-downtime updates. Tech Lead / DevOps Sprint 4 Future
GDPR / PII Privacy Violation Company: Heavy regulatory fines, legal action, and massive reputational damage. Raw transaction data access is restricted exclusively to authorized senior Database Administrators. Implement automated data masking and hashing pipelines in the Feature Store before data reaches the ML training environment. Data Engineer / SecOps Sprint 2 In Process
API Inference Latency >50ms End-users: Frustratingly slow checkout process leading to cart abandonment and lower conversion rates. Utilizing a simplified baseline model architecture (e.g., XGBoost) to keep prediction times naturally low. Optimize the final serialized deep learning model using ONNX Runtime or TensorRT to guarantee sub-50ms execution. ML Engineer Sprint 4 Future
Unexpected departure of Key Team Member Project Timeline & Team: Severe delays in pipeline delivery and loss of critical domain/architectural knowledge. Conducting daily stand-up meetings to share current context, tasks, and immediate blockages across the team. Enforce a strict "Bus Factor" policy by requiring detailed Runbooks, Architectural Decision Records (ADRs), and mandatory code reviews. Project Manager Sprint 1 Done

What the IT Project Should Produce — Expected Outcomes

Concrete deliverables and measurable results the Fraud Detection System project will produce upon successful completion.

🔁

Automated Retraining Pipeline

A fully automated CI/CD pipeline (Airflow + MLflow) that retrains the fraud classifier on new labeled data, evaluates the Challenger model, and registers it — with zero manual intervention.

⚡ 50% faster model updates
📦

Model Versioning & Registry

Every trained model is versioned and tracked in MLflow with metadata (metrics, parameters, artifacts). Rollback to any previous version is possible within minutes.

🔒 Full audit trail
📊

Real-time Monitoring Dashboard

Grafana dashboard tracking live model performance: Recall, False Positive Rate, prediction latency, and data drift alerts — visible to all stakeholders.

🎯 99.9% uptime SLA
🚀

Zero-Downtime Deployment

Blue-green or canary deployment strategy on Kubernetes ensures the production system stays online during model updates. Automatic rollback triggered if FPR degrades.

⏱️ <50ms prediction latency
📋

Governance & Documentation

Complete project documentation: RACI matrix, risk register, communication plan, incident runbooks, GDPR compliance checklist, and unit-tested source code on GitHub.

✅ GDPR compliant
🎯

Improved Fraud Detection Accuracy

The new model detects 95%+ of fraud cases including previously missed patterns, while keeping False Positive Rate below 1% — protecting legitimate users from being blocked.

📈 95% accuracy achieved

Agile Methodologies

How Scrum & Kanban work together in the Fraud Detection System project

🔀
This project uses a Scrum–Kanban hybrid

Scrum structures the development lifecycle into 4 time-boxed sprints with ceremonies (Planning, Review, Retrospective). Kanban visualises the day-to-day task flow on the board — controlling WIP limits and keeping the pipeline unblocked. Both run simultaneously throughout the project.

🔄 SCRUM — How We Use It in This Project

Scrum gives the team a structured, time-boxed lifecycle to build, evaluate and ship the fraud model in predictable increments

1

Product Backlog

All desired features and tasks for the fraud system, owned & prioritised by the Product Owner. Items are ordered by fraud risk impact and technical dependency.

Retrain pipeline automation Grafana monitoring dashboard GDPR data masking Blue-green deployment A/B shadow testing Rollback mechanism Drift detection (evidentlyAI) ONNX model optimisation
2

Sprint Planning

Tech Lead (Scrum Master) + Product Owner + Dev Team select which backlog items to commit to for the next 2 weeks. The Sprint Goal is defined. Tasks are estimated in story points.

Example — Sprint 2 Planning Goal: "Deliver a registered Challenger model that outperforms Production on Recall & FPR using freshly labeled fraud data from the Feature Store."
3

Sprint Backlog

The committed subset of tasks for this Sprint. Each item becomes a Kanban card on the board with an assigned owner, tag (MLOps / Data / QA / PM) and progress tracker.

Sprint 2 Sprint Backlog example:
  • Fraud Analyst labels new fraud cases (Data)
  • Data Engineer updates Feature Store (Data)
  • ML Engineer trains Challenger model (XGBoost) (MLOps)
  • evidentlyAI drift detection integrated (MLOps)
  • GDPR masking pipeline verified (Data + SecOps)
4

Sprint 2 weeks

The team executes — cards move across the Kanban board every day. The Tech Lead runs a 15-minute Daily Standup every morning to surface blockers before they stall progress.

🟢
Done yesterday?
"Trained XGBoost Challenger — Recall 0.93 on test set"
🔵
Today's plan?
"Compare Challenger vs Production in MLflow, generate report"
🔴
Blockers?
"GPU quota exceeded — DevOps escalation needed"
5

Potentially Shippable Product Increment

At the end of each Sprint, the team delivers a working, tested, "Done" increment. The Definition of Done is enforced strictly:

✅ Challenger model trained & registered in MLflow
✅ Metrics beat Production (Recall ≥ 0.92, FPR < 1%)
✅ Code reviewed & merged to main branch
✅ Unit & integration tests passing in CI/CD
✅ GDPR compliance verified by Data Engineer
✅ Pipeline runs with zero manual steps
🔍

Sprint Review

Last day of Sprint · ~2 hours · All stakeholders attend

The team demos the working increment to stakeholders. Product Backlog is updated based on feedback received during the demo.

In this project:
  • Live demo of Challenger model metrics on Grafana dashboard
  • Fraud Analyst confirms new fraud patterns are correctly detected
  • Product Owner formally accepts or rejects the increment
  • Backlog reprioritised (e.g. A/B shadow testing moved up if FPR risk found)
  • Next Sprint scope agreed with all stakeholders
👥 Attendees: Full Scrum Team + PM + Fraud Analyst + Tech Lead
🔁

Sprint Retrospective

After Review · ~1.5 hours · Team only (no stakeholders)

The team reflects on how they worked, not what they built. Three questions drive continuous process improvement every sprint.

What went well?
"Automated pipeline trigger saved 3h of manual work in Sprint 2"
⚠️
What needs improvement?
"GPU quota blocker was discovered mid-sprint, too late"
🎯
What will we change?
"Add GPU usage check to Sprint Planning checklist — action: DevOps"
👥 Attendees: MLOps Eng · ML Eng · QA Eng · Data Eng · Tech Lead only

Sprint Timeline — 4 × 2 Weeks

Full project lifecycle — goals, key deliverables and milestones per Sprint

S1
Sprint 1 · Weeks 1–2
Foundation & Setup
  • Stakeholder register & RACI matrix defined
  • Feature Store schema designed
  • MLflow model registry configured
  • Retrain pipeline trigger implemented
  • AWS Spot Instances & budget alerts set up
  • Bus Factor runbooks started (ADRs)
🎯 Pipeline skeleton running end-to-end
S2
Sprint 2 · Weeks 3–4
Core ML Pipeline
  • Fraud Analyst labels new fraud cases
  • Feature Store updated with labeled data
  • Challenger model trained (XGBoost)
  • evidentlyAI drift detection integrated
  • GDPR data masking pipeline built
  • Challenger vs Production comparison logic
🎯 Challenger model registered in MLflow
S3
Sprint 3 · Weeks 5–6
Evaluation & QA
  • Challenger vs Production final comparison
  • A/B shadow testing on live traffic
  • Grafana FPR real-time alerts configured
  • QA performance evaluation report
  • Bias & fairness audit by Fraud Analyst
  • QA sign-off: Precision 0.95 / Recall 0.92
🎯 Model approved for production deployment
S4
Sprint 4 · Weeks 7–8
Deploy & Monitor
  • Blue-green deployment to Kubernetes
  • Grafana dashboard live for all stakeholders
  • Incident runbooks written & reviewed
  • ONNX model optimised (<50ms latency)
  • Final documentation & ADRs completed
  • 1-week code freeze before presentation
🏁 Zero-downtime deploy · Monitoring live

📋 KANBAN — How We Use It in This Project

Kanban runs inside every Sprint — it visualises and controls the daily task flow so the team always knows what to work on next

1

👁️ Visualise Every Task

Every item from the Sprint Backlog becomes a card on the Kanban board. Nothing is hidden — if it's not on the board, it's not being worked on. The MLOps Engineer, ML Engineer, Fraud Analyst and QA all update their cards after each Daily Standup.

2

🚦 WIP Limits Prevent Overload

Max 3 cards "In Progress" at once. This stops the ML Engineer from training three models simultaneously while completing none. When a card is blocked (e.g. GPU quota exceeded), it is flagged red and escalated at the next Daily Standup immediately.

3

⚡ Continuous Delivery After Sprint 4

Once the initial 4 Scrum sprints are complete, the fraud detection system switches to pure Kanban mode for ongoing operations. When evidentlyAI detects data drift → the retrain pipeline triggers automatically → the new model flows through the board → deployed without waiting for a sprint boundary.

Board Column Definitions — Fraud Detection Pipeline
📥 Backlog

All tasks not yet in a sprint. Fed from Product Backlog. Prioritised by the PO based on fraud risk impact and business value.

No WIP limit
e.g. "Implement ONNX model optimisation"
📋 To Do

Sprint-committed items ready to be picked up. All dependencies are met — labeled data is available, access is granted.

WIP limit: 6
e.g. "Label new fraud cases (Fraud Analyst)"
⚙️ In Progress

Actively being worked on. Owner assigned, progress bar tracked. Blocked cards flagged red and escalated at next Standup.

WIP limit: 3 ← critical
e.g. "Train Challenger model — 70% done"
🔍 Review / QA

Built but awaiting verification — code review, model metric check by QA Engineer, or peer test of the pipeline logic.

WIP limit: 2
e.g. "QA validates Recall ≥ 0.92"
✅ Done

Meets the Definition of Done: tested, merged, documented, and either deployed or staged and ready for zero-downtime production deploy.

No WIP limit
e.g. "MLflow registry configured ✓"

Certifications & Achievements

Professional certifications and completed courses in MLOps, Cloud Computing, and Software Engineering.

Coming Soon

MLOps Specialization

DeepLearning.AI

Expected: 2025

Coming Soon

AWS Solutions Architect

Amazon Web Services

Expected: 2025

Coming Soon

Kubernetes Administrator (CKA)

Cloud Native Computing Foundation

Expected: 2025

Coming Soon

Docker & Containerization

Docker Inc.

Expected: 2025

Coming Soon

Python for Data Science

DataCamp / Coursera

Expected: 2025

Coming Soon

Machine Learning Engineering

DeepLearning.AI

Expected: 2025

Coming Soon

Terraform & Infrastructure as Code

HashiCorp

Expected: 2025

Coming Soon

CI/CD & DevOps Fundamentals

GitHub / GitLab

Expected: 2025

⚠ Temporary Section This page was created specifically for the IT Projects subject (Mr. Marku) and will be removed after the course ends.