~ git:(main) man naveen
NAVEEN(1)                            General Commands Manual                            NAVEEN(1)

NAME
       naveen — turn chaos into uptime

SYNOPSIS
       naveen [-d | --debug] [-k | --kernel] [-f | --fire] system
       naveen [-c | --cost] [-o | --optimize] aws_account ...
       naveen [-a | --architect] [-s | --scale] platform teams ...
       naveen [-r | --recover] [-t deadline] corrupted_data angry_client
       naveen [-S | --secure] [--zero-trust] vpc iam secrets ...
       naveen --why-is-this-broken production

DESCRIPTION
       I fix systems that are on fire and build platforms that don't catch fire.
       Currently at Nielsen. Previously Flexcar and OYO.

INCIDENTS RESOLVED
       $100M contract recovery
              28 days. 10TB. 7 pipelines. 100% accuracy. Zero SLA breaches.

       cgroups_v2_oomkill
              EKS upgrade changed memory accounting. Page cache + kernel memory now counted toward limits.

       efs_martian_packets
              VPC CIDR overlap + rp_filter = silent drops. One node in hundreds. Found via dmesg.

       nfs_loopback_deadlock
              Hard mount + dead server = D-state forever. Threads stuck in uninterruptible sleep.

       etl_cascade_corruption
              One missing staging folder corrupted 28 days across 7 RT pipelines. Incremental systems assume continuity.

       o_n_squared_hot_path
              Linear scan inside loop. 29 billion comparisons. 78 min runtime. Indexed lookup. 97% reduction.

       connection_pool_exhaustion
              Missing max-lifetime. Connections never recycled. Pool exhausted over days.

       vpc_endpoint_sg_drift
              Two teams made changes. Private DNS override + missing SG rule. Silent API timeouts.

       ssl_proxy_interception
              Corporate proxy MITM + missing CA in truststore = certificate validation failed.

       transitive_dependency_mismatch
              JDBC driver upgrade pulled Scala 2.13 into Spark 2.12 classpath. Class loading failures.

       kubernetes_ip_exhaustion
              CNI warm pool defaults + auto-scaling = subnet exhausted by reservations not pods.

       base_image_eol
              Pinned OS version EOL. Package repos disappeared. All builds failed.

       stale_lookup_race
              Calendar table max date in past + MAX+1 key generation = intermittent NULL failures.


COST SAVINGS
       $560K+/year total
              Traffic-based autoscaling ($90K). ETL redesign ($50K). AWS governance ($360K). Infrastructure right-sizing ($50K).

       traffic_autoscaling
              Analyzed 27 days traffic. 261K requests. 3-tier CronJob scaling. 43% compute reduction.

       etl_platform_redesign
              85% cost reduction. Right-sized Spark profiles per command. Extract/Ingest no longer using Transform-sized clusters.

       aws_governance
              14 dashboards. ML anomaly detection. Eliminated $30K/month spike. 20% cost reduction target.

       output_formatter_fix
              O(n⁴) to O(1). 2hr to 5min. 97% compute reduction. 29 billion operations eliminated.

SYSTEMS BUILT
       async_job_orchestration
              1000+ concurrent jobs. Atomic claiming with optimistic locking. Heartbeat monitoring. Autoscaling workers.

       7_system_reconciliation
              Cross-platform DQC. 7 systems integrated. Billions of records. Sub-hour deviation detection.

       etl_platform_redesign
              97 Scala files analyzed. 28 critical issues. Parent-child execution model. 85% cost reduction.

       real_time_analytics
               Real-time analytics on Druid. Petabyte scale. Sync/async routing. Hot/warm/cold storage tiers.

       eks_fleet_migration
              1.23 to 1.33 across 4 teams. Reverse-engineered NodeConfig. Filled AWS documentation gaps.

ENVIRONMENT
       $LANG        Java, Python, Scala, Go, Bash
       $DATA        Kafka, Spark, Druid, Airflow, PostgreSQL, DynamoDB
       $MACHINES    AWS (VPC, IAM, EKS, FSx, Compute, Security), Linux (storage,networking,monitoring)
       $INFRA       Kubernetes, Terraform, Helm, GitLab CI/CD
       $OBSERVE     Grafana, Prometheus, CloudWatch, dmesg, strace

EXIT CODES
       0            system recovered, client happy, contract saved
       1            found the bug, mass-produced documentation
       137          OOMKilled — but now I know why
       139          segfault traced, core dump analyzed
       143          SIGTERM caught, graceful shutdown achieved
       255          kernel said no, I said watch me

SEE ALSO
       github.com/nkr-ops
       linkedin.com/in/naveenkumarreddyk

HISTORY
       2024–          Nielsen
       2022–2024      Flexcar
       2021–2022      OYO

BUGS
       Debugs problems that aren't assigned. Writes too much documentation.

                                         2026-01-05                                    NAVEEN(1)