19:14 |
thimbronion |
One way to tackle all of the junk data in the encyclopdia entries is to run through each entry and check each word against a dictionary and name dictionary to generate a list of ocr junk associated with entries. |
19:18 |
whaack |
seems like that'll help a lot but it'll leave the most confusing OCR mistakes for readers (ones that accidently map to another word) |
19:19 |
thimbronion |
True. Idk how to handle that case. |
| |
~ 32 minutes ~ |
19:51 |
whaack |
thimbronion: All I can think of is running two OCRs and then flagging all the mismatches. |
19:52 |
thimbronion |
whaack: good idea. |
19:52 |
whaack |
but I imagine that the OCR algorithm itself probably has this type of check built in, so you'd probably need a really different OCR |
19:54 |
thimbronion |
If only I had like 20 slavegirls. |
19:56 |
whaack |
lulz |
19:56 |
thimbronion |
I could just reward them for finding errors with whippings. |