[Open-contentmining] How can I help?

Peter Murray-Rust pm286 at cam.ac.uk
Fri Dec 13 07:48:53 UTC 2013


I'm delighted to have had an enquiry of help for content-mining.The good
news is:

*Everyone has a role to play in content-mining*

Here are some important areas - please submit others. There are lots of
micro-tasks that everyone can become involved in.

==project==

* identifying a need
* coordinating a community effort
* summarising current practice (e.g. rights, barriers, resources)
* creating resources (e.g.corpora)
* running a project

==crawling==

* identifying sites to mine
* collecting bibliographic metadata (e.g. tables of content)
* agreeing web-friendly protocols (e.g. delay times)
* writing or finding crawlers
* creating or deploying crawl scripts
* managing workflow manually or or automatically
* recording crawl log
* saving crawled materials

==document==

* formalising structure of document (e.g. sections)
* creating or finding vocabularies for annotation

==generic tools==

* crawlers
* PDF readers
* flat text readers
* graphics analyzers
* image analyzers

==databases==

* customization

==natural language==

* collection of NLP tools
* vocabularies
* corpora for training
* training
* testing
* domain tools

== graphics==

* reconstruction of diagrams from primitives
* SVG tools

==images==

* selection
* croppings
* binarisation
* edge detection/segemnts
* optical character recognition

==text==

* fonts

==tables==

* reconstruction
* interpretation

==audio==

==video==

==semantics==

* annotation
* links

==domain==

* maths
* chemistry
* geo
* dates
* units of measurement

==argumentation==

* document structure
* sentiment analysis

==documentation==

==sociopoliticololegal==

==community==

* mailing lists
* crowdcrafting


-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-contentmining/attachments/20131213/6c352308/attachment.html>


More information about the open-contentmining mailing list