![]() ![]() It was helpful to be able to look at the data and get familiar with the results, but I think now I might skip this step. Lookup-names -input text $ch -c 0.1 -types Place -csv $ch.csvĪt this point, I concatenated the individual chapter CSV files into a single CSV file that I could import into Excel, where I spent some time sorting the results by support and similarity scores to try to find some reasonable cut-off values to filter out mis-recognized names without losing too many accurate names that DBpedia Spotlight identified with low certainty. (Note that this is C-shell foreach syntax if you use something else you’ll have to find out the for loop syntax.) Ran the NameDropper lookup-names python script on each chapter file to generate a CSV file of Places for each chapter.(a command-line utility that splits a file on a pattern):Ĭsplit -f chapter 80days.txt "/^Chapter/" '' Split the text into individual files by chapter using cplit. ![]() Note that Around the World in 80 Days is in the Public Domain in the U.S., and according to the Project Gutenberg License, once you have removed the Gutenberg license and any references to Project Gutenberg, what you have left is a public domain ebook, and “you can do anything you want with that.” Manually removed the Project Gutenberg header and footer from the text, as well as the table of contents.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |