cahn | Frederick the Great discussion post 9

Okay, if I haven't made any mistakes:

1) You should both have access to the Fritzian Library - Restricted Section.
2) The link https://drive.google.com/drive/folders/1sWCLDZ2X8-jWKK3bYMtfNgbrUGMuYwlp?usp=sharing should work for anyone to view the main library, without having to ask permission.
3) You two (but no one else) should have edit access to the library, in case you want to upload new items or fix any weirdnesses that may be bugging you in the machine-translated correspondence. The one thing I've fixed so far is having Fritz and Suhm walk under the beech trees and elms instead of the beech trees and abalones, lol.

I don't know if you have access to delete, but it goes without saying, I trust you not to. I've created backup copies in a separate, private folder, because accidents happen. (Because Google Drive doesn't make it easy to copy a folder, you may have gotten per-item notifications; sorry for the spam if so.)

My criteria for the Restricted Section are recently published, largely original, monograph-length works. Journal articles, largely unoriginal works (e.g. translations of 18th century documents), or long-ago published works are freely available.

If you run into any permissions problems, let me know. Otherwise, enjoy!

Btw, I did work in a library once, long ago. :) Happy to be royal librarian in our fandom!

Updates from the royal librarian:

1) Political correspondence to AW, Heinrich, and Ulrike up through September 1761 (1758 for AW, obvs) has been uploaded. After September 1761, it gets a lot trickier.

2) Deleted some duplicate files.

3) Swapped out a nearly unreadable copy of Lavisse's Youth of Frederick for a more readable one.

Enjoy!

Thank you! I most certainly do. As evidenced by this quote from Fritz to Ulrike when she suggests marrying Amalie to the Danish King:

I thank you, my dear sister, for your good intentions for my sister Amalie, but we are in no hurry to marry her. If the King of Denmark requests it, then we will have to see what we can advise, but the party is not at all as advantageous as it seems, there are children from the first marriage, and my sister will not have all the credit you seem to think, not to mention that these kind of alliances often lead to greater embarassment than they are useful. Besides, I do not like to throw my sisters at people.

Says the man who encouraged Ulrike to keep flirting with Voltaire so he can write love poetry to him, and married her to the Swedes where she's been seething about being stuck in a constitutional monarchy ever since.

But if Fritz said something and didn't immediately do the opposite, would he really be Fritz?

It occurred to me after I went to bed that I had made a mental note to check for multi-page letters and make the code solve that problem, but I'd forgotten and uploaded without doing that. So there may be truncated letters, but it's on my list of things to fix. Most letters are short enough to fit onto a single page, though.

I also started looking into OCR APIs last night, because of course I did. :P

ETA: *checks fandom email*

AND I'm notified that the much more important bug of "I uploaded entirely the wrong file for Heinrich" is there too. *facepalm*

Okay, let me get this straightened out.

ETA 2: Heinrich correspondence should be fixed now. I need to leave for an appointment soon, but I will check AW and Ulrike for truncated multi-page letters when I get back.

ETA 3: Okay, mistakes were made (not by me, of course :P ), but Ulrike and AW should be fixed now too.

Let me know if you run into any other issues and I will ~~find someone to cashier over it~~ fix them.

Edited 2020-01-16 01:45 (UTC)

HAHAHAHA OK this made me laugh.

Let me know of your progress on the OCR APIs :P

Progress on OCR APIs is good. I have a proof of concept, and I don't think it would take much more work to get the full set of images submitted in an automated way to the API.

The cost appears to be $1.50, which is more than reasonable.

Now the tricky part is the output of 1,678 images that would need to be manually inspected and cleaned up before being fed to Google Translate. The OCR quality seems pretty good for individual words, but it tends to move entire lines around, and of course there's a lot of extraneous text (footnotes and such) that you don't want to feed to Google Translate, and you'd have to correct some of the formatting by hand. Stuff that I could automatically do when the pages had been converted to text conveniently all marked with html tags that my code could detect.

I'm now debating whether I want to do that much OCR cleanup by hand. Convince me, guys.

Btw, that's 1541 images for Heinrich, 107 for Ulrike, and 30 for Peter III (because why not), between late 1761 and early 1782.

...Yeah, he wrote to Heinrich a lot.

Edited 2020-01-17 03:36 (UTC)

Yeah, he wrote to Heinrich a lot.

Once a week when they weren‘t having one of their „not talking to you“ phases, according to biographers. The term „mutual addiction“ does come to mind. But wow, that would be an awful lot of work...

It would. It occurred to me I could potentially be bribed into it with books from my wishlist, but right now I'm prioritizing whipping some posts for

rheinsberg into shape.

But if anyone wants to bribe me, the next item on my wishlist is a 200-page, 20-euro monograph called "Des Königs Knabe: Friedrich der Große und Antinous." If anyone wanted the remaining Heinrich letters, someone could promise to Venmo me the money after I deliver the letters and someone else could read and summarize the volume. (An obvious division of labor suggests itself here. ;) Especially since someone has already expressed willingness to commission the Wilhelmine letters, and ended up getting those and many others for free.)

I'm all about book bribes :P :D Just saying, I am up for this!

Sweeet! Well, let me catch up on

rheinsberg, and then see just how time-consuming the OCR cleanup will be.

You guuuuys...I must show off!

So last night I needed a break from assembling material for Rheinsberg posts, and since I was very tired, I thought some mindless OCR cleanup would do the trick.

Being me, I almost immediately decided to start seeing if I could solve the biggest problem in an automated way. And the biggest problem was that moving around of lines that I'd talked about. For instance, the following four lines:

souciant pas de sa perte et relevant toujours les assaillants de nouvelles
troupes, la garnison avait été forcée. Voilà cependant des circon-
stances que je ne saurais vous garantir, n'ayant pas de nouvelles sûres
sur cela.

were rendered by the Google API as:

souciant pas de sa perte et relevant toujours les assaillants de nouvelles
troupes, la garnison avait été forcée.
stances que je ne saurais vous garantir, n'ayant pas de nouvelles sûres
sur cela.
Voilà cependant des circon-

Which has all the right words, but half of line 2 is suddenly a new line 5. And that just seemed weird.

Well, from my reverse-engineering, it looks like Google is doing OCR as a two-step process:

1) Detecting the location of each individual word on the page.
2) Assembling the words together in text form and give them to the user.

Well, because Google is nice like that (thank you, Google!), the API actually gives you the results of both steps. In other words, I was getting, not only the garbled text printout above, but also each individual word with x and y coordinates. For example:

{ "description": "avait", "boundingPoly": { "vertices": [ { "x": 323, "y": 874 }, { "x": 368, "y": 873 }, { "x": 368, "y": 888 }, { "x": 323, "y": 889 } ] } },

{ "description": "été", "boundingPoly": { "vertices": [ { "x": 386, "y": 875 }, { "x": 412, "y": 875 }, { "x": 412, "y": 889 }, { "x": 386, "y": 889 } ] } },

{ "description": "forcée.", "boundingPoly": { "vertices": [ { "x": 429, "y": 874 }, { "x": 487, "y": 873 }, { "x": 487, "y": 889 }, { "x": 429, "y": 890 } ] } },

{ "description": "stances", "boundingPoly": { "vertices": [ { "x": 95, "y": 903 }, { "x": 160, "y": 903 }, { "x": 160, "y": 916 }, { "x": 95, "y": 916 } ] } }

And by closely inspecting the x and y coordinates of individual words that were getting returned out of order, I realized that the coordinates were correct. It was step 2, assembly of individual words, that the Google API was getting wrong.

Well, step 1 requires a team of Google-level engineers and I would never try it, but step 2 is pretty easy. You just have to sort a bunch of numbers in order to get the correct order, and then print out the words in the correct order, sans coordinates.

I did it! I now have a script that bypasses the printout from Google and constructs its own printout based on the raw coordinates.

Why is this important? Well, the detection of individual words looks good enough to me that as long as the words are in the correct order, I think I can get away with not comparing the OCR to the original text.

I still need to do manual cleanup of all 1500 or so letters, because automatically detecting things like footnotes, ends of letters, paragraph breaks , etc. is hard, and I'm going to have to go through and make judgments as to which text I'm interested in passing to the translate API and which text I want to discard.

But the key point here is that I can put all this in one file and eyeball it, and I do not have to open 1500 image files and compare them side-by-side to make sure the text is in the right place. If the first 5 or so letters I've tried this on are indicative, that is a solved problem.

Being able to scan through one long file and reformat things using macros is going to be a million times faster than opening 1500 images and moving my eyeballs back and forth as I try to make sure the OCR text matches the scanned text.

It's still going to be a little while before I can deliver (especially because of my backlog of

rheinsberg posts and also some new posts I want to make, omg), but at least now I'm thinking days instead of weeks.

Um, if all this goes well and I'm able to deliver these OCRed-plus-translated letters, I'm still going to request the Antinous book. :P I think I'll have earned it.

I'm slain with admiration, or, to put it in a rococo way, I tremble! (At all the Katte posts anew, too.)

I must report my failures as well. I was holding off on replying to today's comments because I was heads-down on the Heinrich correspondence manual cleanup and I was Going! To! Finish! today and get you guys the letters! A few minutes ago, I finished the manual cleanup, completely forgot my mental note to back up the file (you see where this is going), ran the script for the next step, the script deleted the file, and I lost 3 days of time-consuming work.

*weep*

Why I wasn't backing up the file every HOUR or at least every day, I do not know. But I definitely meant to back it up as soon as I was done with the three days of cleanup.

It will hopefully take a little less time to redo than it did in the first place, since I did make improvements to the process as I went along, to speed things up, but depending on my mood, I may continue to be behind on comments for a bit, while I beat this thing into submission.

I was SO! CLOSE!

Oh no! :( I'm sorry, backup shenanigans are THE MOST FRUSTRATING thing!

:( It's my own fault.

If only I had a monkey I could blame. ~~Crackfic where Mimi was framed.~~

Have made good progress on recouping our losses; have backed up; will continue tomorrow.

Okay, I'm calling it a night. Translation is done; I found 3 truncated letters that I will fix manually in the morning; then if I don't find any more bugs, I will upload it and you can tell me about all the bugs you find. ;)

More or less caught up on post 9 comments, so tomorrow after finishing Heinrich, the plan is to catch up on post 10, selenak's no doubt awesome fic, Rheinsberg posts, and any new comments that come in while I'm asleep! This order of events subject to being interrupted by any incoming comments with the word "Katte" in the subject. :P

Librarian Mildred here to report a generous 9-volume donation from our royal patron, consisting of all 4 volumes of Preuss's Lebensgeschichte as well as the 5 accompanying volumes of source documents.

Our patron sends his "appreciative regards to Hostess, Reader and all others in orbit of the salon."

Wow. But good lord, I won't be able to read this for a loooong while now. Doesn't mean I'm not grateful as hell for the future opportunity!

Nobody expects you to! But it's there for reference purposes now. :) Like the time I wanted to see if Preuss had anything to say about Fredersdorf and financial irregularities (he didn't, which makes me even more suspicious).

Frederick the Great discussion post 9

Fritzian library

Re: Fritzian library

Re: Fritzian library

Re: Fritzian library

Re: Fritzian library

Re: Fritzian library

Re: Fritzian library

Re: Fritzian library

Re: Fritzian library

Re: Fritzian library

Re: Fritzian library

Re: Fritzian library

Re: Fritzian library

Re: Fritzian library

Re: Fritzian library

Re: Fritzian library

Re: Fritzian library

Re: Fritzian library

Re: Fritzian library