An exploration into overlooked areas of forensics data analysis, case management and metadata using automation and natural language processing

  • Glenn Nor

    Student thesis: Doctoral Thesis

    Abstract

    As computer components keeps getting cheaper and more affordable, while also getting better, faster and with more capacity, it negatively impacts digital forensic investigations by the increasing complexity which comes from information overload. Finding correlations, patterns and critical evidence in digital evidence of today’s data sizes of hundreds to thousands of gigabytes is becoming increasingly difficult. Current literature show attempts at addressing these issues, but there is still a big gap between the current state and current need to solve this efficiently.

    The work presented in this portfolio introduced novel approaches to (1) handling case management and metadata of any size automatically and present it using distributed interface. (2) automatic analysis of large-scale email communications using machine learning to give unique insights in graph form. (3) automatic preparation of custodian activity correlation and document content (entity) correlation for use in graph-based and timeline-based correlation visualizations. To develop these insights, the use of the computational genre of design science research development methodology and (b) agile development methodology was used, allowing for iterative and cycle-based development.

    The research revealed that there are several areas of digital forensics which are overlooked by the digital forensics research community, even more than those covered by the portfolio. It also revealed that the use of data science, mathematics and iterative development methodologies can provide solutions to the issues found in these overlooked areas.

    An important implication of this work is the emergent discovery of how these overlooked areas of digital forensics can easily integrate into a grand framework for digital forensic investigations. Instead of having random research projects, some crossing areas with others, some helping others develop, the grand framework design introduces a more community driven approach to a coherent integration across several research areas. This portfolio marks the introduction and beginning of this grand framework but there are more research areas, and this framework can grow much larger than what presented here.
    Date of Award2024
    Original languageEnglish
    SupervisorMabrouka Abuhmida (Supervisor) & Eric Llewellyn (Supervisor)

    Keywords

    • Computer forensics
    • Digital evidence
    • Digital forensic
    • Natural Language Processing
    • Machine Learning
    • Automation
    • Metadata
    • Sentiment Analysis
    • Named Entity Recognition
    • Case manadement
    • Email analysis

    Cite this

    '