Project Overview
This project demonstrates a real-time data pipeline for stock market data. It integrates Apache Kafka for data streaming and multiple AWS services for data storage and querying.
Key Features:
Real-time data ingestion using Kafka Producer.
Data storage in AWS S3 via Kafka Consumer.
Data cataloging using AWS Glue Crawler.
Data querying with AWS Athena for analysis.
Architecture
EC2 Instance: Hosts Kafka and Zookeeper.
Kafka Producer: Generates stock market data.
Kafka Consumer: Consumes data and sends it to S3.
S3 Bucket: Stores real-time data.
AWS Glue Crawler: Creates data catalog from S3.
AWS Athena: Queries data for insights.
Technologies Used:
Apache Kafka
AWS S3
AWS Glue Crawler
AWS Athena
EC2 Instance
Conclusion:
This pipeline ensures efficient handling of real-time data for analysis and decision-making in financial markets.