Simon Willison’s Weblog

Subscribe

Thursday, 30th June 2022

Release s3-ocr 0.3 — Tools for running OCR against files stored in S3
Release s3-credentials 0.12 — A tool for creating credentials for accessing S3 buckets
Release s3-ocr 0.4 — Tools for running OCR against files stored in S3

s3-ocr: Extract text from PDF files stored in an S3 bucket

Visit s3-ocr: Extract text from PDF files stored in an S3 bucket

I’ve released s3-ocr, a new tool that runs Amazon’s Textract OCR text extraction against PDF files in an S3 bucket, then writes the resulting text out to a SQLite database with full-text search configured so you can run searches against the extracted data.

[... 1,493 words]

Wednesday, 29th June 2022
Friday, 1st July 2022

2022 » June

MTWTFSS
  12345
6789101112
13141516171819
20212223242526
27282930