Interpreting Pretrained Speech Models for Automatic Speech Assessment of Voice Disorders

Hok Shing Lau*, Mark Huntly, Nathon Morgan, Adesua Iyenoma, Biao Zeng, Tim Bashford

*Awdur cyfatebol y gwaith hwn

Allbwn ymchwil: Pennod mewn Llyfr/Adroddiad/Trafodion CynhadleddCyfraniad i gynhadleddadolygiad gan gymheiriaid

23 Wedi eu Llwytho i Lawr (Pure)

Crynodeb

Speech contains information that is clinically relevant to some diseases, which has the potential to be used for health assessment. Recent work shows an interest in applying deep learning algorithms, especially pretrained large speech models to the applications of Automatic Speech Assessment. One question that has not been explored is how these models output the results based on their inputs. In this work, we train and compare two configurations of Audio Spectrogram Transformer [1] in the context of Voice Disorder Detection and apply the attention rollout method [2] to produce model relevance maps, the computed relevance of the spectrogram regions when the model makes predictions. We use these maps to analyse how models make predictions in different conditions and to show that the spread of attention is reduced as a model is finetuned, and the model attention is concentrated on specific phoneme regions.
Iaith wreiddiolSaesneg
TeitlArtificial Intelligence in Healthcare - 1st International Conference, AIiH 2024, Proceedings
Is-deitlFirst International Conference, AIiH 2024, Swansea, UK, September 4–6, 2024, Proceedings, Part I
GolygyddionXianghua Xie, Gibin Powathil, Iain Styles, Marco Ceccarelli
CyhoeddwrSpringer
Tudalennau59-72
Nifer y tudalennau14
ISBN (Electronig)978-3-031-67278-1
ISBN (Argraffiad)978-3-031-67277-4
Dynodwyr Gwrthrych Digidol (DOIs)
StatwsCyhoeddwyd - 14 Awst 2024
DigwyddiadAIiH: International Conference on AI in Healthcare - Swansea, Y Deyrnas Unedig
Hyd: 4 Medi 20246 Medi 2024
https://aiih.cc/#:~:text=Late%2Dbreaking%20Abstract%20Submission%20is,to%20Friday%206%20September%202024.

Cyfres gyhoeddiadau

EnwLecture Notes in Computer Science
CyhoeddwrSpringer
Cyfrol14975
ISSN (Argraffiad)0302-9743
ISSN (Electronig)1611-3349

Cynhadledd

CynhadleddAIiH: International Conference on AI in Healthcare
Gwlad/TiriogaethY Deyrnas Unedig
DinasSwansea
Cyfnod4/09/246/09/24
Cyfeiriad rhyngrwyd

Ôl bys

Gweld gwybodaeth am bynciau ymchwil 'Interpreting Pretrained Speech Models for Automatic Speech Assessment of Voice Disorders'. Gyda’i gilydd, maen nhw’n ffurfio ôl bys unigryw.
  • AIiH: International Conference on AI in Healthcare

    Biao Zeng (Siaradwr)

    4 Medi 20246 Medi 2024

    Gweithgaredd: Cymryd rhan mewn digwyddiad neu drefnu digwyddiadCymryd rhan mewn cynhadledd

  • INTERSPEECH 2024

    Biao Zeng (Trefnydd), Xiaoyu Zhou (Siaradwr), Tom Powell (Siaradwr), Tim Bashford (Siaradwr), Mark Huntly (Siaradwr), Nathan Morgan (Siaradwr) & Robert Salter (Siaradwr)

    1 Medi 2024

    Gweithgaredd: Cymryd rhan mewn digwyddiad neu drefnu digwyddiadCymryd rhan mewn cynhadledd

  • Speech Breathing Workshop

    Biao Zeng (Trefnydd), Shakiela Davies (Trefnydd) & Chelsea Owen (Trefnydd)

    6 Meh 2023

    Gweithgaredd: Cymryd rhan mewn digwyddiad neu drefnu digwyddiadTrefnu digwyddiad

    Ffeil

Dyfynnu hyn