[okfn-labs] SEC EDGAR scraping experience?
Friedrich Lindenberg
friedrich at pudo.org
Tue Sep 30 20:50:33 UTC 2014
Hey Rufus (and labs :),
I was browsing around for info about scraping the SEC’s EDGAR database and delighted to see that some of the first results were your work on it [1], [2]. I’m thinking about looking into that data casually, and I was wondering whether you might have some help for me on a few questions:
1) Do you have any sense how large a full scrape of the data (the XML portion at least) might be?
2) Did you ever play with any of the available parsers for the actual SGML filings? [3] looks like this might be quite traumatic to the untrained explorer.
3) Similarly, did you ever try out any of the Python tooling for XBRL?
Having asked (3), I will now drink to forget and wish you all a pleasant evening.
- Friedrich
[1] http://okfnlabs.org/blog/2014/03/04/sec-edgar-database.html
[2] https://github.com/datasets/edgar
[3] https://stackoverflow.com/questions/13504278/parsing-edgar-filings
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20140930/fc909768/attachment-0003.sig>
More information about the okfn-labs
mailing list