[okfn-labs] File type detection in python-magic, MS Office files

Rufus Pollock rufus.pollock at okfn.org
Tue Apr 16 14:09:16 UTC 2013


Is there a reason to use libmagic at all rather than just use python
stdlib mimetypes?

The standard python mimetypes library does a pretty good job on all of
the ones you list. (in my experience magic / libmagic is more useful
where you have files with no extension and you need to guess from file
content). e.g.

>>> mimetypes.guess_type('abc.xlsx')
('application/vnd.openxmlformats-officedocument.spreadsheetml.sheet', None)

>>> mimetypes.guess_type('abc.ppt')
('application/vnd.ms-powerpoint', None)

Rufus

On 16 April 2013 13:55, Marian Steinbach <marian at sendung.de> wrote:
> Hi everybody!
>
> I am trying to guess the correct mime type and file extension for binary
> files scraped from a server, within python. Since this is something that
> should have come up in a multitude of projects, I'm curious if there is a
> robust solution.
>
> The python-magic module uses libmagic in the background to guess file types.
> (This means that results may vary from platform to platform.)
>
> Currently I am developing on Mac OS and I have these results for the six
> most common MS Office formats (which are, besides PDF, the most important
> ones for me).
>
> .XLS: application/vnd.ms-excel (okay)
>
> .XLSX: application/vnd.ms-excel
>   expected:
> application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
>
> .PPT: application/msword
>   expected: application/vnd.ms-powerpoint
>
> .PPTX: application/vnd.ms-powerpoint
>   expected:
> application/vnd.openxmlformats-officedocument.presentationml.presentation
>
> .DOC: application/msword (okay)
>
> .DOCX: application/msword
>   expected:
> application/vnd.openxmlformats-officedocument.wordprocessingml.document
>
> python-magic's Magic class accepts a file path to an alternative magic file.
>
> Does anybody here have experience with creating a magic file that
> python-magic digests, especially one that helps properly recognize the
> office formats?
>
> Is it possible to use one magic file accross different platforms? I've had
> bad luck with a magic and magic.mgc file copied from Ubuntu. Both create
> error messages.
>
> Thanks!
>
> Marian
>
>
> _______________________________________________
> okfn-labs mailing list
> okfn-labs at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/okfn-labs
> Unsubscribe: http://lists.okfn.org/mailman/options/okfn-labs
>




More information about the okfn-labs mailing list