Available for consulting

I build Data Platforms
at scale.

Helping startups and scale-ups architect, build, and optimize data infrastructure for millions of users. From CDC pipelines to real-time analytics — I make your data work.

Work With Me View Case Studies

Scroll to explore

About Me

Engineering data infrastructure for the real world.

I'm Manikaran Kathuria — a Data Platform Engineer and Consultant with 6+ years of experience building scalable, reliable, and cost-efficient data infrastructure from scratch. Currently working as an SDE-3 across backend and data infra domains.

I specialize in helping startups and scale-ups solve their hardest data challenges: from designing CDC pipelines and real-time analytics stacks to optimizing billion-row databases and cutting infrastructure costs. My work sits at the intersection of systems engineering and data architecture.

M.Tech, IIT Delhi

Computer Science, 2019

6+ Years

Software & Data Infra

India

Working Remotely

Services

How I can help you.

Targeted consulting offerings for companies that need to build, fix, or scale their data infrastructure.

Data Platform Setup from Scratch

Problem

No existing data infrastructure or a fragmented, unreliable stack.

Approach

Design and build an end-to-end data platform with ingestion, processing, storage, and serving layers using proven open-source tools.

Outcome

A reliable, scalable data platform that grows with your product and team.

CDC & Real-Time Pipeline Architecture

Problem

Stale data, manual ETL, or broken sync between operational and analytics databases.

Approach

Implement Change Data Capture with Debezium, Kafka, Kinesis, and Flink for real-time data flow with custom offset management.

Outcome

Near-instant data freshness across all systems with zero-downtime syncing.

OLAP & Analytics Infrastructure

Problem

Slow dashboards, analytics queries timing out, or no self-serve analytics.

Approach

Deploy fast OLAP engines like StarRocks alongside data lakes (S3 + Iceberg) and dbt for transformation.

Outcome

Sub-second analytics queries and empowered teams making data-driven decisions.

Query Performance Optimization

Problem

Production queries running for minutes, slow API responses, or database bottlenecks.

Approach

Deep-dive into query plans, add targeted indexing, implement partitioning and partitionwise aggregation strategies.

Outcome

Up to 1000x improvement in query performance. Measurable, provable results.

Data Cost Optimization

Problem

Cloud data bills growing faster than your revenue. Wasteful compute and storage.

Approach

Audit query patterns, optimize storage formats, right-size infrastructure, and implement intelligent caching.

Outcome

Hundreds of dollars saved daily without sacrificing performance or reliability.

Observability & Reliability Engineering

Problem

Silent pipeline failures, missing data, or no visibility into system health.

Approach

Build comprehensive monitoring, alerting, and observability across the entire data stack.

Outcome

Proactive issue detection, faster incident resolution, and data you can trust.

Technical Stack

Tools I work with daily.

Streaming

KafkaKinesisFlinkKCLDebezium

Storage

S3Apache IcebergHiveHDFS

Databases

PostgreSQLMySQLRedisDynamoDB

Analytics

StarRocksdbtDatabricksSpark

Infrastructure

TerraformAnsibleDockerKubernetesAWS

Backend

PythonJavaGoNode.jsREST / gRPC

Case Studies

Real impact. Real systems.

Built Data Lake + Real-Time Analytics Stack

Designed and built an end-to-end data lake architecture on S3 with Apache Iceberg, coupled with a real-time analytics stack using StarRocks. This replaced a fragmented system of ad-hoc queries and slow batch jobs.

S3IcebergStarRocksDebeziumKafkadbt

Key Results

Real-time data freshness across all analytics
Self-serve analytics for the entire org
90% reduction in data pipeline complexity

1000x Query Optimization in Production

Identified and optimized critical production queries on billion-row PostgreSQL tables through systematic analysis of query plans, strategic partitioning, composite indexing, and partitionwise aggregation strategies.

PostgreSQLIndexingPartitioningQuery Plans

Key Results

1000x improvement in query execution time
API response times dropped from seconds to milliseconds
Eliminated database bottleneck for core product flows

Cost Optimization using Query Intelligence

Built a query intelligence layer that analyzed data access patterns, identified wasteful compute, optimized storage formats, and right-sized infrastructure across the entire data platform.

Cost AnalysisStorage OptimizationInfrastructureMonitoring

Key Results

Hundreds of dollars saved daily
40% reduction in compute costs
Zero performance degradation

Blog

Writing & insights.

Technical deep-dives and lessons from the trenches.

Architecture

Get in Touch

Let's build something great.

Have a data challenge? I'd love to hear about it. Whether you need a full platform build-out or a targeted optimization sprint, let's talk.

Send an Email Connect on LinkedIn

I build Data Platforms
at scale.

Engineering data infrastructure for the real world.

How I can help you.

Data Platform Setup from Scratch

CDC & Real-Time Pipeline Architecture

OLAP & Analytics Infrastructure

Query Performance Optimization

Data Cost Optimization

Observability & Reliability Engineering

Tools I work with daily.

Streaming

Storage

Databases

Analytics

Infrastructure

Backend

Real impact. Real systems.

Built Data Lake + Real-Time Analytics Stack

1000x Query Optimization in Production

Cost Optimization using Query Intelligence

Writing & insights.

Building a Production Data Lake with S3 + Iceberg

Change Data Capture Done Right with Debezium

How I Optimized Postgres Queries by 1000x

Let's build something great.

I build Data Platformsat scale.

Engineering data infrastructure for the real world.

How I can help you.

Data Platform Setup from Scratch

CDC & Real-Time Pipeline Architecture

OLAP & Analytics Infrastructure

Query Performance Optimization

Data Cost Optimization

Observability & Reliability Engineering

Tools I work with daily.

Streaming

Storage

Databases

Analytics

Infrastructure

Backend

Real impact. Real systems.

Built Data Lake + Real-Time Analytics Stack

1000x Query Optimization in Production

Cost Optimization using Query Intelligence

Writing & insights.

Building a Production Data Lake with S3 + Iceberg

Change Data Capture Done Right with Debezium

How I Optimized Postgres Queries by 1000x

Let's build something great.

I build Data Platforms
at scale.