10 items tagged “ibm”
2024
Docling. MIT licensed document extraction Python library from the Deep Search team at IBM, who released Docling v2 on October 16th.
Here's the Docling Technical Report paper from August, which provides details of two custom models: a layout analysis model for figuring out the structure of the document (sections, figures, text, tables etc) and a TableFormer model specifically for extracting structured data from tables.
Those models are available on Hugging Face.
Here's how to try out the Docling CLI interface using uvx
(avoiding the need to install it first - though since it downloads models it will take a while to run the first time):
uvx docling mydoc.pdf --to json --to md
This will output a mydoc.json
file with complex layout information and a mydoc.md
Markdown file which includes Markdown tables where appropriate.
The Python API is a lot more comprehensive. It can even extract tables as Pandas DataFrames:
from docling.document_converter import DocumentConverter converter = DocumentConverter() result = converter.convert("document.pdf") for table in result.document.tables: df = table.export_to_dataframe() print(df)
I ran that inside uv run --with docling python
. It took a little while to run, but it demonstrated that the library works.
2009
Scaling Django web apps on Apache. Cool to see this kind of article cropping up on IBM developerWorks, but it’s a shame they don’t mention mod_wsgi.
DB2 support for Django is coming. From IBM, under the Apache 2.0 License. I’m not sure if this makes it hard to bundle it with the rest of Django, which uses the BSD license.
2008
Damien Katz: New Gig. IBM have employed Damien Katz to work full time on CouchDB. The work will be under the Apache license with the ASF owning the copyright.
2006
Introducing Operator. New microformat detecting Firefox extension, developed at IBM and released by Mozilla Labs. Examples are from Yahoo! Local, Upcoming and Flickr.
2005
IBM poop heads say LAMP users need to “grow up”. Ryan blows away a ton of the myths surrounding LAMP.
Nope. We call bullshit. After wasting years of our lives trying to implement physical three tier architectures that "scale" and failing miserably time after time, we're going with something that actually works.
IBM: ’LAMP’ users need to grow up (via) Which is why Friendster switched from JSP to PHP. Pfft.
IBM to Free Java—Next Week? The question mark means it’s a rumour.
2003
Linux on the desktop at IBM
Spotted on Slashdot, IBM’s Open Source Desktop—Directions for today... and Tomorrow presentation includes one slide that really caught my attention:
[... 95 words]2002
IBM accessibility center
IBM’s Accessibility Center has a plethora of useful information and resources, including a free 30 day trial of their Home Page Reader text-to-speech browser software.