BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20230124T171524Z
LOCATION:D222
DTSTART;TZID=America/Chicago:20221113T093000
DTEND;TZID=America/Chicago:20221113T094500
UID:submissions.supercomputing.org_SC22_sess432_ws_cafcw113@linklings.com
SUMMARY:Long Document Transformers for Pathology Report Classification
DESCRIPTION:Workshop\n\nLong Document Transformers for Pathology Report Cl
 assification\n\nChandrashekar, Lyngaas, Gao, Hanson, Gounley\n\nIn recent 
 years, deep learning-based models for electronic health records have shown
   impressive  results  in  many  clinical  tasks.   Deep  learning  classi
 fication models  typically  require  large  labeled  training  datasets  a
 nd  are  designed  to address specific clinical tasks.  Transformers are p
 owerful state-of-art language models  designed  to  learn  inherent  patte
 rns  in  unstructured  text  data  in  an unsupervised manner.  The transf
 ormer model’s unsupervised training enables generalizability and reusabili
 ty of the model to various clinical tasks, negating the need for labeled d
 ata in the training phase.  The trained transformer can then be fine-tuned
  towards a specific clinical task using a small but task-curated training 
 dataset.  In the current work, we build a transformer model that can effec
 tively  accommodate  the  length  of  typical  cancer  pathology  reports.
    We use 5.7 million pathology reports from six Surveillance, Epidemiolog
 y, and End Results  Program’s  (SEER)  cancer  registries  to  train  “fro
 m  scratch”  the  Big-Bird model.  Big-Bird model is a transformer model b
 uilt for long documents (up to 4096 tokens) compared to popular models suc
 h as BERT (up to 512 tokens). As the memory requirement of a transformer m
 odel scales quadratically with the sequence length of input text,  Big-Bir
 d utilizes sparse attention.  In phase one, Big-Bird is built in an unsupe
 rvised manner using the pre-training task called masked language predictio
 n.  This phase requires the largest amount of  computation,  and  it  leve
 rages  the  secure  CITADEL  capability  for  working with protected  heal
 th information (PHI) data  on the Summit  supercomputer at the Oak Ridge L
 eadership Computing Facility.  In phase two,  we fine-tune the pre-trained
  Big-Bird model to handle five information extraction tasks: site, sub-sit
 e, histology, laterality, and behavior.  For fine-tuning, we use data from
  six SEER registry data with the 10-day window constraint before and after
  the date of cancer diagnosis,  and the ground truth for five tasks is fro
 m the manually coded  CTC  (Cancer/Tumor/Case)  report.   One  advantage  
 of  this  two-phase approach is the re-usability of the phase one model fo
 r any pathology-relevant clinical task in phase two.  Our results show tha
 t the proposed Big-Bird model fine-tuned with SEER data on five informatio
 n tasks outperforms the current state-of-the-art deep learning classificat
 ion model by an average of 2% microF1 score on all tasks and an average 8%
  macro F1 score on all tasks.  In most challenging  tasks, subsite has  a 
  4%  increase  in  micro  F1  score  and histology has a 25% increase in m
 acro F1 score.  The results demonstrate the promise of using a single pret
 rained model on five related clinical tasks.  We plan to further test the 
 generalizability and reusability of the model by extending the tasks to ot
 her clinically useful tasks such as bio-marker extraction and identificati
 on of malignant and metastatic disease.\n\nSession Format: Recorded\n\nReg
 istration Category: Workshop Reg Pass
END:VEVENT
END:VCALENDAR