Snowflake and AWS Data Pipeline
Snowflake and AWS Data Pipeline
Introduction
Maintainer: brandi_coleman@bmc.com
This use case is available for the VSE Demo System in both QA & Prod via the OnDemand Demo. It is currently under constructions for the Helix Control-M Demo Systems.
This flow is ordered daily via the zzz-order process in aws production, and runs at 10:00 UTC each day. Each day, there is also a cleanup process that removes the flow and containerized agent that it runs on.
This use case is also an On-Demand demo. The On-Demand Demo Service for this flow takes approximately 7 minutes to complete.
Use Case Overview
This workflow will utilize applications such as Snowflake, Kafka, and Lambda to create a aws-centric data pipeline.
Use Case Technical Explanation
The first job in this workflow will conduct a Snowflake query which will select specific information to put into a csv file on the local pod. The file is then transfered to S3 as well as Lambda. The Kafka CLI is installed for this demo which will first create a topic, then put a message on the topic to say βHello Worldβ and subsequently delete the topic within Kafka. There is a databricks job type that will utilize spark and create a notebook. The flow has an SLA Management job type at the end as well to monitor the entire data pipeline.
To view the demo flow code-base please navigate to the Snowflake and AWS Data Pipeline Git Repository
Job Types Included
- File Transfer (S3)
- AWS (Lambda)
- File Transfer (Local File System)
- OS (Kafka)
- Database Embedded Query (Snowflake)
- Hadoop (Dummy Job)
- SLA Management
Demo Environment Information
Environment | Status |
---|---|
Helix Production | Under Construction |
VSE CTM PROD | Available |
VSE CTM QA | Available |