[okfn-labs] File type detection in python-magic, MS Office files
Marian Steinbach
marian at sendung.de
Tue Apr 16 12:55:19 UTC 2013
Hi everybody!
I am trying to guess the correct mime type and file extension for binary
files scraped from a server, within python. Since this is something that
should have come up in a multitude of projects, I'm curious if there is a
robust solution.
The python-magic module uses libmagic in the background to guess file
types. (This means that results may vary from platform to platform.)
Currently I am developing on Mac OS and I have these results for the six
most common MS Office formats (which are, besides PDF, the most important
ones for me).
.XLS: application/vnd.ms-excel (okay)
.XLSX: application/vnd.ms-excel
expected:
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
.PPT: application/msword
expected: application/vnd.ms-powerpoint
.PPTX: application/vnd.ms-powerpoint
expected:
application/vnd.openxmlformats-officedocument.presentationml.presentation
.DOC: application/msword (okay)
.DOCX: application/msword
expected:
application/vnd.openxmlformats-officedocument.wordprocessingml.document
python-magic's Magic class accepts a file path to an alternative magic file.
Does anybody here have experience with creating a magic file that
python-magic digests, especially one that helps properly recognize the
office formats?
Is it possible to use one magic file accross different platforms? I've had
bad luck with a magic and magic.mgc file copied from Ubuntu. Both create
error messages.
Thanks!
Marian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20130416/64b3fe8a/attachment-0001.html>
More information about the okfn-labs
mailing list