Re: Henri de Catt

Date: 2020-01-30 02:01 am (UTC)
mildred_of_midgard: (0)
*nods* Yes, it seems unless any given Fritz statement can be counterchecked with also being in the diary, we have to put it into double quotes.

Well, good news: I've done a proof of concept, and OCR + translate on the diary is going to be perfectly doable. The really tedious part is going to be cropping the images to remove all that stuff in the margins (like dates) that would confuse OCR + translate. Fortunately, unlike the correspondence, there's only about a hundred pages.

If the margins were all the same size [ETA: I mean from one image to the other, even if the top, bottom, left, and right margins were different from each other], I could do it with a single line of code, but alas, the margins are different on each page. Soooo, I either learn how to do something mathy that sounds fascinating but beyond my cognitive capacity at this time (but not at a normal time, dammit, but at a normal time I wouldn't be doing techy stuff in my spare time; I like my hobbies to be different from my day job), or I manually crop images.

In my current state, 100 manual image crops it will be! Hopefully I don't also have to manually set the permissions on each image, but I will if I have to. [ETA: do not have to manually set permissions! Got the one line of code approach to work.]

Oh, the downside will be that if you want to know the date for a given entry, you'll have to look it up in the original file. But I'll leave in the page numbers so it's at least easy to find.

Hoping this won't take more than a couple days, although it partly depends on how many images I can bring myself to crop in one session before I wander off and do something else. ;) And also how many comments with the magic K word come in!
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting

Profile

cahn: (Default)
cahn

June 2025

S M T W T F S
1234567
891011121314
1516171819 2021
222324 25262728
2930     

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jul. 2nd, 2025 10:48 am
Powered by Dreamwidth Studios