Phani Gudipally

Data Engineering Leader

Phani Gudipally

Manager, Data Engineering

I’m a data engineering leader focused on reliable, AI-ready data platforms. I design scalable pipelines, LLM-friendly models, and self-serve analytics that work for both technical and business teams. Here I share notes on data engineering, AI-native infrastructure, and lessons from building high-trust teams in fast-moving environments.

Manager, Data Engineering at Shopify, leading revenue data platform architecture and cross-functional analytics initiatives.

Data Engineering AI-ready Analytics Leadership Data Architecture Analytics Engineering Data Governance Streaming Data Data Platforms
  • 14+ years building data platforms and analytics products
  • 4+ years leading global engineering teams
  • Supported 2,000+ users with data platforms at Amazon and Shopify
  • Designed AI-native data architecture enabling self-serve analytics at scale
  • Defined AI-native analytics workflows with curated context bundles, validation gates, and audit logging
  • Built AI-native data modeling standards with canonical entities, rich metadata, and compact semantic layers
  • Developed agent-ready data pipeline patterns for RAG: cleaning, enrichment, chunking, embeddings, and retrieval tests
  • Mapped dbt models to AI-ready datasets and retrieval-friendly marts for agent systems

Skills

Strategic leadership and hands-on platform engineering across modern data stacks.

Streaming and Event-Driven Systems

Kafka, Kinesis, Spark Streaming, and Flink for low-latency ingestion, real-time analytics, and alerting.

Data Quality and Observability

Great Expectations, Monte Carlo, and custom SLA monitoring for reliable, audit-ready datasets.

Cloud Platforms and Lakehouse

AWS, GCP, Databricks, Delta Lake, and Apache Iceberg for scalable analytics and AI-ready storage.

Leadership and Strategy

Team building, mentorship, and roadmap ownership. Align engineering execution with business goals across remote and global teams.

Data Modeling, Warehousing, and Analytics Engineering

Scalable data architecture, dimensional modeling, metric standardization, self-serve analytics. Hands-on with Snowflake, Redshift, BigQuery.

AI and LLM Tools

OpenAI and Perplexity APIs, LangChain, MCP servers, automated SQL generation, and AI agents.

ETL/ELT and Orchestration

Pipeline development using Airflow and dbt with production-grade observability and testing.

Data Governance and BI

Data quality, metadata management, lineage, and compliance. BI with Looker, Tableau, and Superset.

Experience

Building resilient data ecosystems, AI-ready analytics, and high-performing teams.

Manager, Data Engineering

Shopify • Revenue Data Engineering • Seattle

Jan 2024 - Current
  • Built Shopify's first Revenue Data Engineering team from the ground up, supporting a $5B+ revenue pipeline.
  • Architected the revenue data platform using dbt, Airflow, and cloud data services.
  • Led cross-functional design for analytics platform migrations across BigQuery, Snowflake, and Databricks.
  • Developed AI-native data models and MCP servers to streamline self-serve analytics and enrichment.
  • Built AI agents for lead enrichment using OpenAI and Perplexity APIs to improve qualification accuracy.
  • Mentored engineers and established best practices in testing, modeling, and automation.

Manager, Data Engineering

Amazon • Product Compliance, Compensation • Seattle

Mar 2020 - Sep 2023
  • Led architecture for large-scale, cloud-native data infrastructure supporting 2,000+ users.
  • Built platforms on AWS Data Lake, Redshift, EMR, S3, Glue, Lambda, DynamoDB, and Athena.
  • Reduced compliance investigation cycle times by 80% through automated pipelines and self-service analytics.
  • Optimized storage and compute costs with tiered storage, tagging, and observability-driven scaling.
  • Mentored and coached a team of 6 engineers with a focus on ownership and delivery.

Senior Data Engineer

Amazon • Alexa AI, Amazon Devices, Product Compliance • Seattle

2015 - 2020
  • Delivered large-scale data solutions with Redshift, Kinesis, S3, EC2, RDS, Glue, EMR, and Lambda.
  • Built cross-functional relationships with data scientists, product managers, and software engineers.
  • Developed reliable ETL/ELT pipelines and optimized models for analytics and reporting.
  • Automated reporting workflows to enable self-serve analytics for internal stakeholders.

Senior Data Analyst

PayPal - LatentView Analytics • Digital Marketing Analytics • San Jose

2010 - 2014
  • Translated business requirements into analytical deliverables aligned to objectives.
  • Designed and analyzed campaign effectiveness to grow market share and revenue.
  • Built executive dashboards using Tableau for marketing leadership.

Featured Writing

Notes on data strategy, modern stacks, and AI-native analytics.