Posted by on February 7, 2018

This week’s readings, I must admit, put me in a somewhat difficult position, mostly because I considered them so illuminating. Before taking this course, the most advanced tool that I was familiar with that could pull large portions of text data from published sources was Voyant. Brandon Walsh and Sarah Horowitz briefly discuss this tool in their work, and they illustrate its ability to generate compelling and visually captivating word clouds. I actually attempted to use Voyant for a digital humanities project at my previous institution where I was tasked with analyzing literally hundreds of books from a nineteenth-century Jesuit library. Even if I could not make any major discoveries, at the very least I hoped to create some eye-catching displays that could be used in conference presentations. However, the project had neither the staffing nor funding to scan all the books into text files, and not all of the books were available through Project Gutenberg or similar websites. I was, therefore, left to manually input data into a spreadsheet like the one below:

Obviously, this provided me with little time to dig into the meat of the texts through methods like distant reading. That is likely why this week I found myself so interested in topic modelling and its application in the “Mining the Dispatch” project. Robert K. Nelson utilized the MALLET software package to identify recurring topics in a digitized collection of newspapers from Richmond, Virginia during the Civil War. Although he was reluctant to draw major conclusions from the project (instead admitting that topic modelling needs to be complemented with traditional archival research), I was struck by his initial findings, including those on anti-northern diatribes and poetry that he parleyed into a New York Times piece. Topic modelling should definitely not represent the final stage of a scholastic endeavor, but it can help launch some very convincing arguments.

I can honestly imagine this digital method benefiting my own research, and I can envision a project where I identify treaties with Native Americans from the Great Lakes region and search for topics related to women and their métis children. Therein lies the difficult position that I now find myself in. I have thought a good deal about my previously discussed mapping project already, and I am not sure if I should allow an introduction to new technology, no matter how exciting, to divert me from that path at this point. Additionally, I am not sure if a topic modelling project would fit with the source base that I am currently hoping to interrogate only because, as Megan R. Brett mentions, many topic modelling software packages (like Paper Machines in her case) require at least a thousand documents to produce fruitful results. At the very least, though, we have already discussed ways in previous weeks to convert PDF files to text that computer programs can understand, so I do not think I am under the same technical limitations that I was under a few years ago. As always, I am open to any comments or suggestions as the due date for the updated project proposal looms.

Posted in: Blog Posts


Be the first to comment.

Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>