Archive for Thursday, 30th June 2022

Thursday, 30th June 2022

Release s3-ocr 0.3 — Tools for running OCR against files stored in S3

30th Jun 2022, 12:44 am

Release s3-credentials 0.12 — A tool for creating credentials for accessing S3 buckets

30th Jun 2022, 8:02 pm

Release s3-ocr 0.4 — Tools for running OCR against files stored in S3

30th Jun 2022, 9:03 pm

s3-ocr: Extract text from PDF files stored in an S3 bucket

I’ve released s3-ocr, a new tool that runs Amazon’s Textract OCR text extraction against PDF files in an S3 bucket, then writes the resulting text out to a SQLite database with full-text search configured so you can run searches against the extracted data.

[... 1,493 words]

9:40 pm / aws, ocr, pdf, projects, s3, weeknotes, s3-credentials

← Wednesday, 29th June 2022

Friday, 1st July 2022 →

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Simon Willison’s Weblog

Thursday, 30th June 2022

s3-ocr: Extract text from PDF files stored in an S3 bucket