Available for consulting

I build Data Platforms
at scale.

Helping startups and scale-ups architect, build, and optimize data infrastructure for millions of users. From CDC pipelines to real-time analytics — I make your data work.

Scroll to explore

About Me

Engineering data infrastructure for the real world.

I'm Manikaran Kathuria — a Data Platform Engineer and Consultant with 6+ years of experience building scalable, reliable, and cost-efficient data infrastructure from scratch. Currently working as an SDE-3 across backend and data infra domains.

I specialize in helping startups and scale-ups solve their hardest data challenges: from designing CDC pipelines and real-time analytics stacks to optimizing billion-row databases and cutting infrastructure costs. My work sits at the intersection of systems engineering and data architecture.

M.Tech, IIT Delhi

Computer Science, 2019

6+ Years

Software & Data Infra

India

Working Remotely

Services

How I can help you.

Targeted consulting offerings for companies that need to build, fix, or scale their data infrastructure.

Data Platform Setup from Scratch

Problem

No existing data infrastructure or a fragmented, unreliable stack.

Approach

Design and build an end-to-end data platform with ingestion, processing, storage, and serving layers using proven open-source tools.

Outcome

A reliable, scalable data platform that grows with your product and team.

CDC & Real-Time Pipeline Architecture

Problem

Stale data, manual ETL, or broken sync between operational and analytics databases.

Approach

Implement Change Data Capture with Debezium, Kafka, Kinesis, and Flink for real-time data flow with custom offset management.

Outcome

Near-instant data freshness across all systems with zero-downtime syncing.

OLAP & Analytics Infrastructure

Problem

Slow dashboards, analytics queries timing out, or no self-serve analytics.

Approach

Deploy fast OLAP engines like StarRocks alongside data lakes (S3 + Iceberg) and dbt for transformation.

Outcome

Sub-second analytics queries and empowered teams making data-driven decisions.

Query Performance Optimization

Problem

Production queries running for minutes, slow API responses, or database bottlenecks.

Approach

Deep-dive into query plans, add targeted indexing, implement partitioning and partitionwise aggregation strategies.

Outcome

Up to 1000x improvement in query performance. Measurable, provable results.

Data Cost Optimization

Problem

Cloud data bills growing faster than your revenue. Wasteful compute and storage.

Approach

Audit query patterns, optimize storage formats, right-size infrastructure, and implement intelligent caching.

Outcome

Hundreds of dollars saved daily without sacrificing performance or reliability.

Observability & Reliability Engineering

Problem

Silent pipeline failures, missing data, or no visibility into system health.

Approach

Build comprehensive monitoring, alerting, and observability across the entire data stack.

Outcome

Proactive issue detection, faster incident resolution, and data you can trust.

Technical Stack

Tools I work with daily.

Streaming

KafkaKinesisFlinkKCLDebezium

Storage

S3Apache IcebergHiveHDFS

Databases

PostgreSQLMySQLRedisDynamoDB

Analytics

StarRocksdbtDatabricksSpark

Infrastructure

TerraformAnsibleDockerKubernetesAWS

Backend

PythonJavaGoNode.jsREST / gRPC

Case Studies

Real impact. Real systems.

01

Built Data Lake + Real-Time Analytics Stack

Designed and built an end-to-end data lake architecture on S3 with Apache Iceberg, coupled with a real-time analytics stack using StarRocks. This replaced a fragmented system of ad-hoc queries and slow batch jobs.

S3IcebergStarRocksDebeziumKafkadbt

Key Results

  • Real-time data freshness across all analytics
  • Self-serve analytics for the entire org
  • 90% reduction in data pipeline complexity
02

1000x Query Optimization in Production

Identified and optimized critical production queries on billion-row PostgreSQL tables through systematic analysis of query plans, strategic partitioning, composite indexing, and partitionwise aggregation strategies.

PostgreSQLIndexingPartitioningQuery Plans

Key Results

  • 1000x improvement in query execution time
  • API response times dropped from seconds to milliseconds
  • Eliminated database bottleneck for core product flows
03

Cost Optimization using Query Intelligence

Built a query intelligence layer that analyzed data access patterns, identified wasteful compute, optimized storage formats, and right-sized infrastructure across the entire data platform.

Cost AnalysisStorage OptimizationInfrastructureMonitoring

Key Results

  • Hundreds of dollars saved daily
  • 40% reduction in compute costs
  • Zero performance degradation

Get in Touch

Let's build something great.

Have a data challenge? I'd love to hear about it. Whether you need a full platform build-out or a targeted optimization sprint, let's talk.