*nods* Yes, it seems unless any given Fritz statement can be counterchecked with also being in the diary, we have to put it into double quotes.
Well, good news: I've done a proof of concept, and OCR + translate on the diary is going to be perfectly doable. The really tedious part is going to be cropping the images to remove all that stuff in the margins (like dates) that would confuse OCR + translate. Fortunately, unlike the correspondence, there's only about a hundred pages.
If the margins were all the same size [ETA: I mean from one image to the other, even if the top, bottom, left, and right margins were different from each other], I could do it with a single line of code, but alas, the margins are different on each page. Soooo, I either learn how to do something mathy that sounds fascinating but beyond my cognitive capacity at this time (but not at a normal time, dammit, but at a normal time I wouldn't be doing techy stuff in my spare time; I like my hobbies to be different from my day job), or I manually crop images.
In my current state, 100 manual image crops it will be! Hopefully I don't also have to manually set the permissions on each image, but I will if I have to. [ETA: do not have to manually set permissions! Got the one line of code approach to work.]
Oh, the downside will be that if you want to know the date for a given entry, you'll have to look it up in the original file. But I'll leave in the page numbers so it's at least easy to find.
Hoping this won't take more than a couple days, although it partly depends on how many images I can bring myself to crop in one session before I wander off and do something else. ;) And also how many comments with the magic K word come in!
Re: Henri de Catt
Date: 2020-01-30 02:01 am (UTC)Well, good news: I've done a proof of concept, and OCR + translate on the diary is going to be perfectly doable. The really tedious part is going to be cropping the images to remove all that stuff in the margins (like dates) that would confuse OCR + translate. Fortunately, unlike the correspondence, there's only about a hundred pages.
If the margins were all the same size [ETA: I mean from one image to the other, even if the top, bottom, left, and right margins were different from each other], I could do it with a single line of code, but alas, the margins are different on each page. Soooo, I either learn how to do something mathy that sounds fascinating but beyond my cognitive capacity at this time (but not at a normal time, dammit, but at a normal time I wouldn't be doing techy stuff in my spare time; I like my hobbies to be different from my day job), or I manually crop images.
In my current state, 100 manual image crops it will be! Hopefully I don't also have to manually set the permissions on each image, but I will if I have to. [ETA: do not have to manually set permissions! Got the one line of code approach to work.]
Oh, the downside will be that if you want to know the date for a given entry, you'll have to look it up in the original file. But I'll leave in the page numbers so it's at least easy to find.
Hoping this won't take more than a couple days, although it partly depends on how many images I can bring myself to crop in one session before I wander off and do something else. ;) And also how many comments
with the magic K wordcome in!