A Model for the Analytical Performance of Data Lake in Stock Market Analysis with Databricks Delta Lake

S. Kamalakannan*, A. Yasmin, Arun Kumar Ramamoorthy, P. Kavitha

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Stock market investments are highly rewarding but also high in risk. Modern investors use variety of tools to take informed investment decisions. In the current era of digital world, financial service industry has generated huge volume and immense verities of data with extreme speed. Due to the rapid growth in data collection and the heterogeneous nature and complexity of the data, there is a need for Big Data analytical solution that would be able to deal with the stock market data. Large volumes of unstructured, heterogeneous raw data can be stored in a massively scalable manner using data lakes, which are the ideal solution to the big data storage conundrum. The ability of a data lake to preserve data in its original format while processing it at runtime using a schema on-read technique is its key feature. The challenge faced in the data lake is performing analytics which is a significant tool to calculate and analyze the stock market. The proposed architecture of Azure Databricks DeltaLake (ADDL) with Azure DataLake Storage Generation 2 (ADLSG2) is used for analytical processes like Fibonacci retracement for better stock analysis, which aid in forecasting the market price for better investment. As a result, the research focus is to produce a storage having read as well as write capabilities by taking into consideration the Extract-Load-Transform (ELT) operation on the datasource. In this experimental databricks implementation, runtime is performed using open source of Apache Spark API and a highly improved execution engine, which results in a significant performance improvement when comparing to the standard source of Apache Spark available on the ADLS platform. Additionally, the Fibonacci retracement level calculation is achieved with the analytics and forecasting of test close price with various ML and DL techniques such as KNN, LSTM are compared with original price of the test data for better prediction of forecast close price.
Original languageEnglish
Title of host publication2023 International Conference on Self Sustainable Artificial Intelligence Systems (ICSSAS)
PublisherIEEE Computer Society
Pages1065-1071
Number of pages7
ISBN (Electronic)979-8-3503-0085-7
ISBN (Print)979-8-3503-0086-4
DOIs
Publication statusPublished - 6 Dec 2023
EventInternational Conference on Self Sustainable Artificial Intelligence Systems
- Erode, India
Duration: 18 Oct 202320 Oct 2023

Conference

ConferenceInternational Conference on Self Sustainable Artificial Intelligence Systems
Abbreviated titleICSSAS 2023
Country/TerritoryIndia
CityErode
Period18/10/2320/10/23

Keywords

  • Cloud computing
  • Runtime
  • Costs
  • Cluster Computing
  • Lakes
  • Writing
  • Big Data Applications

Cite this