[okfn-labs] SEC EDGAR scraping experience?

Friedrich Lindenberg friedrich at pudo.org
Tue Sep 30 20:50:33 UTC 2014


Hey Rufus (and labs :), 

I was browsing around for info about scraping the SEC’s EDGAR database and delighted to see that some of the first results were your work on it [1], [2]. I’m thinking about looking into that data casually, and I was wondering whether you might have some help for me on a few questions: 

1) Do you have any sense how large a full scrape of the data (the XML portion at least) might be?

2) Did you ever play with any of the available parsers for the actual SGML filings? [3] looks like this might be quite traumatic to the untrained explorer. 

3) Similarly, did you ever try out any of the Python tooling for XBRL?

Having asked (3), I will now drink to forget and wish you all a pleasant evening. 

- Friedrich 


[1] http://okfnlabs.org/blog/2014/03/04/sec-edgar-database.html
[2] https://github.com/datasets/edgar
[3] https://stackoverflow.com/questions/13504278/parsing-edgar-filings 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20140930/fc909768/attachment-0003.sig>


More information about the okfn-labs mailing list