AI Practice - AI and Data Science Blog

AI and Data Science Blog

Sign in Subscribe

AI Practice

Harnessing the harness

Harnessing the harness

On building your own multi-agent orchestrator, and why owning the infrastructure around AI matters.

Video Generation Landscape Analysis: The Road to Informative Video

Video Generation Landscape Analysis: The Road to Informative Video

We tested 2026 SOTA models and found a "usability gap".

Yes, you’re absolutely right… Right? A mini survey on LLM sycophancy

Yes, you’re absolutely right… Right? A mini survey on LLM sycophancy

Ever spoken to an AI and felt like it was responding with insincere praise?

The Realities of Robot Deployment: What It Takes for Embodied AI to Succeed

The Realities of Robot Deployment: What It Takes for Embodied AI to Succeed

The "hype" of robots ignore the unstructured environment problem.

MetaEvaluator: Systematically Evaluate Your LLM Judges

MetaEvaluator: Systematically Evaluate Your LLM Judges

Measure how well your app is performing and more importantly where it's failing.

MLOps Transformation: Moving from Stage 0 to Stage 3 (Part II)

Machine Learning

MLOps Transformation: Moving from Stage 0 to Stage 3 (Part II)

A maturity roadmap and a cultural shift.

Building for Agentic AI - Agent SDKs & Design Patterns

Building for Agentic AI - Agent SDKs & Design Patterns

The true value of AI agents lies in loops and self-correction rather than raw reasoning power.

A deeper look into using MCP in the enterprise

A deeper look into using MCP in the enterprise

A universal "USB-C" for AI?

Building MLOps Bridges: Our Journey in Uplifting Agencies

Machine Learning

Building MLOps Bridges: Our Journey in Uplifting Agencies

A practical guide to MLOps adoption across Government teams.

Building a Better RAG Pipeline for HR Policy Q&A: What Worked and What Didn’t

Building a Better RAG Pipeline for HR Policy Q&A: What Worked and What Didn’t

We tested the most effective approaches.

Benchmarking GPT-5 & GPT-OSS: A Responsible AI Approach

Evaluating dimensions often overlooked by traditional benchmarks.

“The Bots Are Here. Now What?” How Knowledge Management Became the Key to Powering GenAI Solutions

Available LLMs are powerful enough. What we are missing is the knowledge to fuel them.

Introducing LionGuard 2: Multilingual LLM Guardrail for Singapore

We improved its coverage and robustness.

RabakBench: Multilingual AI Safety Evaluation Made Local

Global safety guardrails are often blind to local dialects and sensitivities.

Validating Annotation Agreement between Humans and LLMs

Who Judges the Judge? At GovTech’s AI Practice, we’ve been embracing what’s known as “LLM-as-a-judge” — essentially employing LLMs as evaluators across our AI workflows. This approach has become one powerful approach in our evaluation toolkit. We use LLMs extensively across multiple areas: judging other LLM outputs (e.

Does your LLM know when to say “I don’t know”?

Refusal by a model to answer may sometimes be more valuable.

Fine-Tuning Language Models for Long-Context Data: Automated Stance Analysis of Citizen Discussions

Addressing technical challenges of processing high-volume public feedback for policy-making

Machine Learning

MLOps Transformation: Moving from Stage 0 to Stage 3 (Part I)

Much a cultural shift as a technical one.

Evaluating MOE’s SLS Learning Assistant: Using Synthetic Data and LLMs to Benchmark Faithfulness and Factuality

Safer, faster testing of student-facing AI before real-world deployment.

From Infrastructure to Intelligence (Part 1): Strategic Foundations for AI Model Hosting and Agent-Based Architectures on the Cloud

What began as simple chatbot prototypes has evolved into full-fledged agent architectures.

The other side of Agentic AI

An agent's utility is capped by its environment interface rather than just its reasoning capabilities.

Securing Guardrails with Automated Red Teaming

Manual testing is no longer scalable.

(Part 2) LLM Safety Alignment for the Singapore Context using Supervised Fine-tuning and RLHF-based Methods

Safety must be "baked in".

(Part 1) LLM Safety Alignment for the Singapore Context using Supervised Fine-tuning and RLHF-based Methods

The process of "teaching" models to be safe

Eliciting Toxic Singlish from r1

A red-teaming exercise that proves even "reasoning" models can be coaxed.