Tuesday, February 26, 2013

Ngoni Munyaradzi on Transcribe Bleek and Lloyd

Ngoni Munyaradzi is a Master's student in Computer Science at the University of Cape Town, South Africa, working on a research project on the transcription of the Digital Bleek and Lloyd collection. He kindly agreed to an interview over email, which I present below:

Your website does an excellent job explaining the background and motivation of Transcribe Bleek and Lloyd.  Can you tell us more about the field notebooks you are transcribing?

The Digital Bleek and Lloyd Collection is composed of dictionaries, artwork and notebooks documenting stories about the earliest inhabitants of Southern Africa, the Bushman people. The notebooks were written by Wilhelm Bleek, his sister-in-law, Lucy Lloyd and Dorothea Bleek (Wilhelm's daughter) in the 19th century, with the help of a number of Bushmen people who were prisoners in the Western Cape region of South Africa at the time. The notebooks were recorded in the |Xam and !Kun languages and English translations of these languages are available in the notebooks.

Link to the collection: http://lloydbleekcollection.cs.uct.ac.za/

Correct me if I'm wrong, but it seems like at least in the case of |Xam, you are working with one of the only representatives of an extinct language. Are there any standard data models for these kinds of vocabularies/bilingual texts which you're using?

There are no complete models - the best known models are still only partial.

I suspect that I'm not alone in wondering why these Bushman people were prisoners during the writing of these texts. Can you tell us a bit more about the Bleek/Lloyd informants, or point us to resources on the subject?

The bushman people were prisoners because of petty crimes and a grossly unfair colonial government.  On the Bleek and Lloyd website there is a story on each contributor.  There is information in various books on the subject as well, but I am not sure there is more that is known than what is on the website. see:
http://lloydbleekcollection.cs.uct.ac.za/xam.html
http://lloydbleekcollection.cs.uct.ac.za/kun.html

This is the first transcription project I'm aware of using the Bossa Crowd Create platform. What are the factors that led you to choose that platform and what's been your experience setting it up?

In 2011 when our project began Bossa was the most mature opensource crowdsourcing framework that was tailored for volunteer projects available. Due to this Bossa suited well with the project's requirements. The alternative crowdsourcing frameworks available at the time used payment methods.

Setting up the Bossa framework was a relatively straight-forward task. The documentation online is very thorough and with examples of how to set-up test applications. I also got assistance from David Anderson the developer of Bossa.

The Bushman writing system seems extremely complex with it's special characters and multiple diacritics. I see that you are using LaTeX macros to encode these complexities. Why did you decide on LaTeX and what has been the user response to using that notation?

So the project is part of ongoing research related to the Bleek and Lloyd Collection within our Digital Libraries Laboratory at the University of Cape Town. Credit for developing the encoding tool goes to Kyle Williams. And the reason why he chose to use LaTeX was that; using custom LaTeX macros allowed for both the problem of the encoding and visual rendering of the text to be solved in a single step. Developing a unique font for the Bushman script is something we might look at in the future!

Here's a link to a paper published on the encoding tool developed by Kyle Williams: http://link.springer.com/chapter/10.1007%2F978-3-642-24826-9_28

Overall the user feedback has been good, as most users are able to complete transcriptions using the LaTeX macros. We have gotten suggestions from users to use glyphs to encode the complexities. Currently the scope of my masters research project does not include that. There are talks in our research group to develop a unique font to represent the |Xam and !Kun languages, as this is not supported by Unicode.

User 1 Comment: "I think the palette handles the complexity of the character set very well. This material is inherently difficult to transcribe. The tool has, on the whole, been well thought out to meet this challenge. I think it needs to be improved in some ways, but considering the difficulties it is remarkably well done."

User 2 Comment: "VERY intuitive, after a few practice transcriptions. I actually enjoyed using the tool after a page was done."

This is incredibly useful. So far as I'm aware, yours is only the third crowdsourced transcription project that's surveyed users seriously (after the North American Bird Phenology Project and Transcribe Bentham). Do you have any advice on collecting user feedback at such an early stage?

Collecting user feedback in the early stages will tremendously help project administrators determine whether the setup of the project is easy to follow for participants. One can easily pick up any hindrances to user participation and address these early. From our project, I've found that participants can actually suggest very helpful ideas that will make the data collection process better.

Crowdsourced citizen science and cultural heritage projects have mostly been based in the USA, Northern Europe and Australia until recently -- in fact, yours is the first that I'm aware of originating in sub-Saharan Africa. I'd really like to know which projects inspired your work with Transcribe Bushman, and what your hopes are for crowdsourced transcription projects focusing on Africa?

Our work was mostly inspired by the success of GalaxyZoo at recruiting volunteers, and also the Transcribe Bentham project that explored the feasibility of volunteers performing transcription. I hope that more crowdsourced transcription projects will start-up within Africa in the near future. What would be interesting is to see a transcription project for the Timbuktu manuscripts of Mali. Beyond transcription, I would like to see other researchers adopting crowdsourcing in fields of specialty within Africa.

Thanks so much for this interview. If people want to help out on the project, what's the best way for them to contribute?

Interested participants can simply:
  1. Create an account on the project website.
  2. Watch a 5 minute video tutorial on how to transcribe the Bushman languages.
  3. With that, you are ready to start transcribing pages.

No comments: