Building a Scalable Customer Data Platform on AWS Lakehouse

  • INDUSTRY: Media
  • Vertical: BigData, AWS Lakehouse
  • Location: Singapore
  • Completed Date: 28-07-2024

Executive Summary

A leading media organization sought to build a highly scalable and robust Customer Data Platform (CDP) on Amazon Web Services (AWS). The project aimed to implement a Lakehouse architecture, automate data processes, enhance data accessibility, and ensure compliance with data privacy regulations. This case study details the challenges faced, the implemented solution, the technologies employed, and the significant benefits achieved, demonstrating the transformative power of a well-architected CDP.

Challenge

The organization, with a diverse portfolio of media assets, recognized the need to consolidate and manage its customer data more effectively. To achieve this, they embarked on a project to build a cutting-edge CDP on AWS, leveraging cloud-native technologies to create a flexible and scalable platform. The organization faced several key challenges with its existing data management practices:

  • icon Data Silos: Customer data was scattered across multiple systems, making it difficult to get a unified view.
  • icon Manual Processes: Data ingestion and transformation were largely manual, leading to inefficiencies and potential errors.
  • icon Limited Scalability: The existing infrastructure struggled to handle growing data volumes and increasing user demands.
  • icon Data Accessibility: Providing data access to various business units was challenging and time-consuming.
  • icon Compliance Risks: Ensuring compliance with evolving data privacy regulations required a more robust solution.

Solution

OktaBytes team helped the organization develop a comprehensive Payer Portal with the following features:
  • icon Lakehouse Architecture: A multi-layered Lakehouse architecture (Raw, Bronze, Silver) for organized data storage and processing.
  • icon Automated Data Ingestion: Data ingestion from various sources (SFTP, databases, APIs) for both batch and real-time data.
  • icon Automated Data Transformation: Data cleansing, normalization, and enrichment processes to ensure data quality.
  • icon Data Orchestration: Orchestration of data pipelines using AWS Step Functions and Lambda functions for automation and reliability.
  • icon API Development: APIs for secure data access and integration with other systems.
  • icon Infrastructure Automation: Infrastructure as Code (IaC) using Terraform for automated provisioning and management.
  • icon Monitoring and Logging: Comprehensive monitoring and logging using CloudWatch to track system performance and errors.
  • icon Notifications: Slack notifications and SNS for alerting and event handling.
  • icon Security: Robust security measures, including IAM roles, encryption, and access controls, to protect sensitive data.

Tech Stack

Python

Oracle

Icon_24px_BigQuery_Color

BigQuery

Terraform

AWS Services

S3

Icon-Architecture/64/Arch_Amazon-Aurora_64Created with Sketch.

Aurora PostgreSQL

Icon-Architecture/64/Arch_AWS-Step-Functions_64Created with Sketch.

StepFunction

Glue

Icon-Architecture/64/Arch_Amazon-Redshiftct_64Created with Sketch.

Redshift

CloudWatch

Icon-Architecture/64/Arch_AWS-Single-Sign-On_64Created with Sketch.

IAM

Lambda

Neptune

Icon-Architecture/64/Arch_AWS-Simple-Queue-Service_64Created with Sketch.

SQS

Icon-Architecture/64/Arch_AWS-Simple-Notification-Service_64Created with Sketch.

SNS

Icon-Architecture/64/Arch_Amazon-EventBridge_64

EventBridge

Impact

  • icon Unified Customer View: Consolidated customer data from various sources, providing a 360-degree view.
  • icon Improved Data Quality: Automated data cleansing and validation processes enhanced data accuracy and reliability.
  • icon Increased Efficiency: Automated data pipelines reduced manual effort and improved data processing speed.
  • icon Enhanced Data Accessibility: Secure APIs and data cataloging made data readily available to business units.
  • icon Scalability and Reliability: The AWS-based platform scaled effortlessly to handle increasing data volumes.
  • icon Better Compliance: Robust security measures and access controls ensured compliance with data privacy regulations.
  • icon Data-Driven Decisions: Improved data availability and quality enabled better decision-making across the organization.

Conclusion

The Payer Portal project successfully delivered a powerful solution that enhances healthcare data accessibility, provides valuable API insights, and empowers payers with administrative control. By adhering to regulatory requirements and leveraging a robust technology stack, the organization created a differentiator product that strengthens its position in the healthcare market. The project’s success underscores the organization's commitment to innovation and data-driven solutions in healthcare.

Want to transform your data into business growth with AWS Lakehouse.

Our Other Projects