I have several books that I'm preparing for CCEL, with information on lists of authors (name and canonical AuthorID) that would be worth incorporating into Hymnary. Normalizing author ID is going to be a big job: it would be much faster if I could get a table of author names and canonical ID's out of the hymnary, then batch (and script) the cross-reference process. Is there any way to extract that table?
Comments
And in a related question,
And in a related question, giving that the file will be coming in in XML format, what kind of markup would facilitate automatic extraction of author information?
something like
<div class="biography" id="smithj">
?
API
Stephen,
We have an API that lets you get query results in JSON -- would that help you get authorIDs and names? There is info on the bottom of the widgets page. If you need us to export the names and IDs and send them to you, or do something else, let me know.
Of course, editors can go to a hymn edit page and add an author. The autocomplete would help finding the correct authorID if you are going to do it one at a time.
We don't have defined markup in ThML for this purpose. The id attribute might not be best since IDs are supposed to be unique in a document. It wouldn't be valid if there are duplicates. Maybe use an authorID attribute? It's not in the DTD, but at least the resulting document could be well-formed.
Harry
AuthorID or id?
I thought about that. But in this case "id" will be unique--each book only has one biorgraphy for each author.
I'll look into the widgets, thanks!
Stephen
ADDED: The widgets let you do a search that returns all author names (and gender)--but the search doesn't return any other fields, even fields that you "added to the search".
Suggestion for enhancement: show the value of other fields that were added to the search.
another thing
Another thing to try: select the "people" result type in the search widget. Then hit the "export as CSV" at the bottom of the page. You can only get 3000 results this way, but I think it has everything you need.
I'm building scripts to
I'm building scripts to automatically match names that can be unambiguously matched, but for that I need a complete canonical list of personID's. I can get the results of some kinds of searches as CSV, but not the results of browsing through people using the alphabet bar (which is what I need to build a comprehensive list 3000 people at a time).