[Afridig] New list member | Stephen Davis

Davis, Stephen srda227 at g.uky.edu
Wed May 10 19:56:28 SAST 2023


Hello all,
 I'd like to introduce a new method of searching the TRC hearing
testimonies that the Bitter Aloe team created recently.  The Testimony
Search Engine (ML)
<https://streamlit.as.uky.edu/bap_sent_embedding/Testimony_Search_Engine_(ML)>
allows users to search the testimonies semantically rather than relying on
keywords alone.  It is now possible to take a statement given by an
individual and then look for statements that express a similar
experience/sentiment/idea across the entire corpus of testimonies.  This
allows for new forms of ingress into this corpus and allows users to make
connections between approximately 2400 testimonies.  Finding multiple new
entry points enables a new kind of reading that allows users to see
experience across multiple testimonies rather than reading testimonies
linearly one by one.

Users can identify a statement they would like to use as a search query in
TSE (ML), by running a conventional keyword search for a topic of interest
using our companion search engine TSE (BM25)
<https://streamlit.as.uky.edu/bap_sent_embedding/Testimony_Search_Engine_(BM25)>
named
after the BM25 ranking algorithm.  Once a statement of interest is
identified users can then copy the text of the statement and input it as a
query in TSE (ML) to search all testimonies for semantically similar
statements.  Although results vary depending on various factors (length of
statement, degree of uniqueness of expressions within the statement, etc.)
generally speaking it will return results that are semantically similar
even if few if any keywords are shared across those results.  The following
verbatim quote is taken from testimony given by Gideon Nieuwoudt during an
amnesty hearing held on 25 September 1997 in Port Elizabeth.

*"What did these tablets taste of? Or did you merely swallow them?"*

The results returned map out several overlapping semantic fields addressed
in the above statement.  These include the ingestion of tablets and other
substances, the actions of the mouth, and seeking treatment from medical
professionals.

One feature we did not anticipate but users may find useful is a novel
search method we call 'query poetics'.  During testing we created simulated
statements of our own authorship and used those for queries instead of
verbatim statements given by an actual witness.  The term poetics refers to
its secondary meaning as that which is creative and productive, rather than
a query that has some sort of literary quality.  What this allows for is a
more freeform navigation of the testimonies that is solely dependent on an
experience/sentiment/idea that a user is hoping to find represented
somewhere in the testimonies.  The following simulated statement generated
by a user and not an actual witness is a good example of the proper length
and level of detail that will generally return useful results.

*"I heard the dogs barking and I knew the police were there.  They were
chasing us and we hid between houses.  We could then see the vehicles going
by.  I have never been so scared in my life.  I thought for sure we would
be caught."*

Here too you can see multiple semantic fields represented in the results,
with notions of movement, hiding in fear, police vehicles and complex urban
geography represented most prominently, regardless of the presence or
absence of particular keywords.

A word about methods; the Bitter Aloe group took all available hearing
testimonies and converted them from HTML into structured data (.json).
Testimonies were parsed by individual statements made by each speaker.
Statements were then represented as vectors and embedded in
high-dimensional space.

We'd be really grateful if anyone who has a chance to try this out could
provide us with feedback.  It's still in development but we're slowly
rolling it out to see what changes need to be made before we present it to
a wider audience.  Regards

-Steve Davis


On Mon, May 8, 2023 at 9:20 AM Keith Breckenridge via Afridig <
afridig at lists.wiser.org.za> wrote:

> Stephen Davis is the Principal Investigator of the Bitter Aloe Project
> <http://www.bitteraloeproject.com>, a digital humanities research group
> that applies advanced machine learning models to materials collected by the
> Truth and Reconciliation Commission.  He is also author of *The ANC's War
> Against Apartheid: Umkhonto we Sizwe and the Liberation of South Africa
> <https://iupress.org/9780253032294/the-ancs-war-against-apartheid/> *(Indiana
> University Press, 2018).  His research interests include machine learning,
> human rights, the anti-apartheid struggle, and autobiography.  He is
> presently an Associate Professor of History at the University of Kentucky.
>
> Stephen Davis <srda227 at g.uky.edu>
> -------
> Keith Breckenridge  *W I S E R* - The Wits Institute for Social and
> Economic Research, University of the Witwatersrand | Pbag 3, PO Wits,
>  Johannesburg, South Africa, 2050 | Phone +27(0)11-7174272 | Web: wiser.wits.ac.za
>
> _______________________________________________
> Afridig mailing list
> Afridig at lists.wiser.org.za
> http://lists.wiser.org.za/listinfo/afridig
>


-- 


Stephen R Davis
Associate Professor
Department of History
University of Kentucky
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wiser.org.za/pipermail/afridig/attachments/20230510/2c2ecd37/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Testimony_Search_Engine_(BM25)[1]
Type: application/octet-stream
Size: 5458 bytes
Desc: not available
URL: <http://lists.wiser.org.za/pipermail/afridig/attachments/20230510/2c2ecd37/attachment.obj>


More information about the Afridig mailing list