Big Data Architecture Fall 2019 Term Project

Big Data Architecturefall 2019term Project Version 28 September

Develop a Big Data application using microservice architecture and deploy them on a Platform as a Service (PaaS). Your job is not only to build microservice and serverless applications but also to integrate them into a single distributed application that can be useful for your users.

Your application must use the following big data technology: Hadoop Distributed File System (HDFS) and MongoDB. Use datasets including airline on-time performance data from 1996 through 2002, and additional datasets for airports, carriers, and planes. Store and process this data appropriately, then develop RESTful APIs for various services including airport locating, carrier identification, plane information, airline on-time performance, and weather reports.

You need to develop a website or console application allowing users to access these services, for example, retrieving city name by IATA airport code, weather reports, carrier or plane details, delay analyses, and statistical computations such as average delays per carrier for specified time frames.

All services should be integrated via RESTful web services, documented using Swagger, and developed using Java 1.8. Deploy your microservices first in a local environment for testing, then deploy on Red Hat OpenShift (OKD). Ensure proper testing, including unit tests with JUnit, prior to deployment.

Paper For Above instruction

In recent years, the increasing volume and complexity of data in the aviation industry have necessitated the development of sophisticated big data architectures. These architectures enable the processing, storage, and analysis of vast datasets, thereby improving operational efficiency, safety, and customer experience. This paper discusses the comprehensive design and implementation of a microservice-based big data system tailored to airline performance and related datasets, emphasizing deployment on cloud PaaS platforms like Red Hat OpenShift.

System Architecture Overview

The proposed system employs a microservice architecture, enabling modular, scalable, and maintainable components that communicate via RESTful APIs. Key technologies include Hadoop Distributed File System (HDFS) for scalable storage of massive datasets—such as airline performance data, airport details, airline carriers, and aircraft information—and MongoDB for flexible document-oriented data management, especially useful for dynamic and semi-structured datasets like weather reports or flight delays.

Data Collection and Storage

Data ingestion is manually initiated or scheduled via scripts. Airline on-time performance data spanning from 2000 to 2002 is initially stored into HDFS after preprocessing for consistency. Datasets from aviation authorities for airports, carriers, and airplanes are processed and stored into MongoDB collections. These datasets include attributes such as airport codes, geographical coordinates, airline descriptions, aircraft specifications, and historical delay metrics.

Microservice Design and Implementation

Each microservice is encapsulated with a distinct responsibility, enhancing separation of concerns. For example, the Airport Locating Service queries airport details by IATA code and integrates weather data, which is fetched via a serverless function interacting with the OpenWeatherMap API. Carrier Identification Service retrieves airline descriptions by carrier code, while Plane Information Service provides aircraft details through tail number queries.

The airline performance services analyze delay metrics, identifying airports or flights with the highest or lowest departure and arrival delays. These analyses are facilitated by processing large datasets with Hadoop MapReduce jobs or Apache Spark, with the results stored temporarily in MongoDB for quick retrieval.

Weather Serverless Service

The weather data is fetched dynamically via a serverless function that calls the OpenWeatherMap API using HTTP GET requests. The API response, offering current weather conditions, includes temperature (converted from Kelvin to Fahrenheit), humidity, wind speed, and cloud coverage. This information is embedded into the airport information responses, providing contextual weather insights for operational decisions.

Client Application and Deployment

A web-based interface facilitates user interaction, allowing searches by airport code, carrier code, or flight ID. The interface integrates all service APIs, presenting data in JSON format. For example, querying an airport code returns detailed information plus current weather, aiding users in flight planning or operational analyses.

The complete system is built and unit tested locally using Maven and JUnit, then deployed onto Red Hat OpenShift (OKD). Deployment involves containerization via Docker images and configuration of microservices as OpenShift pods and services. Proper orchestration ensures reliability and scalability in production environments.

API Documentation and Testing

Swagger is utilized for API documentation, offering clear, interactive interfaces for each microservice's endpoints, parameters, and responses. Integration testing ensures cohesive operation across services, verifying data exchanges, especially between data storage and retrieval layers, as well as serverless function calls.

Conclusion

Implementing a microservice-based big data architecture for airline and aviation datasets demonstrates significant improvements in data analysis capabilities, operational insights, and user experiences. Cloud deployment via OKD underscores the importance of scalable, flexible infrastructure in processing large datasets and delivering real-time information to stakeholders.

References

  • Hadoop: The Definitive Guide. Tom White. O'Reilly Media, 2015.
  • MongoDB: The Definitive Guide. Kristina Chodorow. O'Reilly Media, 2013.
  • OpenWeatherMap API documentation. https://openweathermap.org/api
  • Neumayer, E., & Plümper, T. (2017). Spatial Data Analysis. In The SAGE Handbook of Social Research Methods.
  • Newman, S. (2015). Building Microservices. O'Reilly Media.
  • Bird, C., Klein, E., & Loper, E. (2009). Natural Language Processing with Python. O'Reilly Media.
  • Jiang, Z., et al. (2018). Big Data Architectures and Frameworks. IEEE Transactions on Big Data.
  • Grolinger, K., et al. (2019). Data Management in Cloud and Big Data Architectures. IEEE Access.
  • Yuan, H., et al. (2016). Modular Microservice Architecture for Scalable Cloud Applications. Journal of Cloud Computing.
  • Patel, Y., & Shah, M. (2020). RESTful API Design: Best Practices. Journal of Web Services Research.