Executive Summary

I’m an AWS-Certified Data Engineer with over 3 years of experience designing scalable ETL pipelines and data solutions on AWS and Azure platforms. My expertise spans cloud data warehousing, SQL optimization, Power BI dashboards, and big data tools like Spark and Hadoop. I’m passionate about transforming raw data into actionable insights and building high-performance, reliable data infrastructure.

Dinesh Saud Profile

Work History

Professional experience as a Data Engineer, specializing in cloud data solutions, analytics, and business impact.

Dec 2021 – Jul 2023
Cotiviti Office
Data Engineer I Cotiviti
Remote / Toronto, ON
AWS · Redshift · HDFS · Python · Hive · Lambda · Databricks
  • Built scalable data lake workflows using Cloudera HDFS and AWS S3 for real-time healthcare claims processing.
  • Engineered ETL pipelines handling 2B+ records/month, supporting AWS-based MDM systems using Redshift and improving data standardization.
  • Streamlined data extraction with Hive and HDFS CLI, reducing processing time by 35% and increasing reliability by 40%.
  • Benchmarked Spark, Hadoop, and Hive pipelines to optimize operational reporting.
  • Delivered HIPAA-compliant, metadata-aware solutions for data lineage tracking and QA.
Jan 2021 – Dec 2021
Cotiviti Office
Research Assistant Herald College Kathmandu
Kathmandu, Nepal
PySpark · Spark · Hadoop · Hive · Kafka · Data Engineering · Benchmarking · Cluster Configuration
  • Benchmarked Spark, Hadoop, and Hive pipelines on 500GB+ datasets to evaluate scalability and latency across diverse workloads.
  • Implemented 20+ PySpark scripts for data cleaning and transformation, improving pipeline efficiency by 40%.
  • Evaluated batch vs. streaming architectures using Kafka and Spark Structured Streaming, reducing processing latency by 25%.
  • Configured a local multi-node Hadoop cluster to support reproducible experimentation and parallel workload testing.
Jun 2020 – Oct 2021
Cotiviti Office
Software Engineer Intern Vox Crow Pvt. Ltd.
Remote / Kathmandu, NP
PHP · MySQL · Backend Development · CRM · Agile · Data Integration
  • Developed and deployed backend CRM features using PHP and MySQL for 5+ client projects.
  • Optimized SQL queries and resolved 15+ complex data integration issues.
  • Released stable, business-aligned modules in agile sprints, improving system reliability and reducing support requests.

Other Experience

Additional roles that strengthened leadership, customer focus, and applied ML skills.

2023 – Present
Budget Rental Car Logo
Assistant Manager - Rental Operations Budget Rental Car
Barrie, ON,CA
  • Managed daily branch operations including rentals/returns, fleet status, and front‑desk service.
  • Met and exceeded sales targets through effective upselling of protection packages and vehicle upgrades.
  • Supported team onboarding and process adherence to maintain consistent customer experience.
  • Handled cash balancing and end‑of‑day reconciliation with accurate reporting.
6‑Month Engagement
Machine Learning Engineer Logo
ML Engineer - College Contract Georgian College - Trusti Diagnostics
Barrie, ON, CA
  • Built image‑classification pipelines with PyTorch (ResNet/EfficientNet), including preprocessing and custom Dataset/DataLoader.
  • Implemented training/validation loops with accuracy & loss visualization; optimized throughput with batching and mixed precision.
  • Delivered reproducible code, experiment reports, and deployment‑ready artifacts aligned to stakeholder requirements.

Projects

Highlighted portfolio projects in Data Engineering, AI, and Analytics.

  • Apache Spark Project

    Apache Spark End-to-End Data Engineering

    Designed and deployed a scalable ETL pipeline using Apache Spark, integrating multiple data sources for batch and real-time analytics.

    View on GitHub
  • Azure Healthcare ETL

    Azure Healthcare ETL

    Built a HIPAA-compliant data pipeline on Azure, leveraging Data Factory and Synapse Analytics for healthcare claims processing.

    View on GitHub
  • Trusti-AI

    Trusti-AI

    AI system with PyTorch for preprocessing, training, and evaluation of ResNet/EfficientNet models with GPU compatibility.

    View on GitHub
  • Power BI Sales Analytics

    Power BI Sales Analytics

    Developed interactive dashboards and KPIs to analyze sales trends, forecast revenue, and support business decision-making.

    View on GitHub
  • Plant Disease Recognition

    Plant-Disease-Recognition

    Computer vision model for detecting plant diseases using CNN architectures, improving agricultural productivity and crop health monitoring.

    View on GitHub
  • Pocl Temporal GNN Fraud Detection

    Pocl_temporalGNN-FraudDetection

    Implemented a temporal Graph Neural Network to detect fraudulent transactions with high precision on financial datasets.

    View on GitHub

Lets Connect

If you have intersting ideas, Please dont hasitate to connect with me

Let’s build with data

I’m Dinesh Saud, a Toronto-based data professional focused on cloud-native ETL, analytics, and ML. This site highlights outcomes from real projects—streaming pipelines, healthcare ETL, and production-ready models. If something here sparks an idea, I’d love to chat.

Contact Details

Address
1024 Pharmacy Ave • Scarborogh • Canada
Phone
+1 249-989-4930
Email
dinesh.4777saud@gmail.com