Business Rule Generation for Product Data Quality Assurance

  • Valentina Valeva

    Student thesis: Doctoral Thesis

    Abstract

    Business operations are reliant on the electronic exchange of data, and are subject to difficulties caused primarily by poor data quality, leading to service failure throughout the supply chain. While the customer-oriented business sectors are well-served by available software solutions to these problems, few such solutions are available for the product-oriented industries.

    The University of South Wales and industrial project partner GXS Ltd. have addressed the lack of software solutions to product data quality problems, through the support of an Engineering and Physical Sciences Research Council (EPSRC) grant. The company produces business rules for customers in the Retail Fast Moving Consumer Goods and in the Consumer Electronics sectors, and incorporates these rules into its Product Data Quality (PDQ) service for data quality checking. This enables its customers to identify quality failures in their data. However, the generation of business rules is time consuming, and must be repeated following changes in the business environment, the addition of new products or of new channels for sales, or changes to existing products. There is a clear need to automate the process of product data quality business rule generation in this sector.

    The project described in this thesis was directed at establishing a methodology for the automated generation of Product Data Quality rules within the Retail Fast Moving Consumer Goods sector, through analysis of the sector and of existing Product Data Quality rules. This methodology was tested through the production of a working prototype system that conforms to industry requirements, establishing the most successful parameters for its operation. For this, it has been necessary to conduct a detailed study of the semantic nature of product data in the sector. The hierarchical nature of the data means that it is possible to have multiple viewpoints on the same data, in relation to their contributions to multiple business rules, and so the study has also considered the multiple relations between data that constitute the business rules.

    This thesis describes work conducted on determining software solutions to product data quality problems. Specifically, it explores the contexts in which notions of acceptable data quality should be understood, describes current data quality services offered, and then establishes the need for automation of product data quality rule generation. Descriptions are provided of methods for ensuring data quality, and of relevant data mining techniques. A methodology for automated rule generation is proposed, framed within the requirements of the sector, and the stages of the rule generation algorithm are outlined. The results of applying the methodology to real supply chain data are presented. The thesis concludes with details of the main findings and achievements of the work. The methodology proposed is capable of producing consistent and reliable rules, and in the case study examined the rules generated have greater functionality, and are fewer in number and more compact than those produced by human rule developers. There remains a role for the human expert, who populates the knowledge base with knowledge of the sector, but this thesis demonstrates that the work of rule generation can be automated, with corresponding benefits to the company of speed, completeness and reliability. It is expected that the outcomes will have value for studies of product data quality assurance more generally.
    Date of Award2019
    Original languageEnglish
    SupervisorPaul Roach (Supervisor) & Ian Wilson (Supervisor)

    Cite this

    '