[okfn-discuss] Using a bot to move web pages (and convert to MediaWiki)

Chris Watkins chriswaterguy at appropedia.org
Wed Sep 17 17:20:08 UTC 2008


Any help on this appreciated!

How do we scr*pe the content from an entire site?  *(I put a * in place of
the "a" in scr*pe because spam filters often don't like that word.)*

Though Appropedia's Public Domain
Search<http://www.appropedia.org/Public_Domain_Search>(Beta) I've
started to identify some sites with a lot of good content on
subjects like sustainable agriculture, aid projects and energy efficiency.
I'd really like to be able to take everything off a given site,
automatically, and put them onto Appropedia, so they can then be wikified
(by bot and manually). The strategy is like Wikipedia being populated with
the old version of Encyclopedia Brittanica etc. Once it's on the site, it's
easier for people to improve and expand those pages, rather than starting
from scratch.

I'm hoping there's a way of connecting a bot to a tool (such as the
Send2Wiki <http://www.mediawiki.org/wiki/Extension:Send2Wiki> extension, or
the tools mentioned at (Appropedia:Porting formatted content to
MediaWiki<http://www.appropedia.org/Appropedia:Porting_formatted_content_to_MediaWiki>),
so we can take a whole list or directory of pages from their source all the
way to the wiki. Any ideas?

I've asked elsewhere with no luck yet, and Jonathan Gray suggested asking
here.

Thanks!
-- 
Chris Watkins (a.k.a. Chriswaterguy)

Appropedia.org - Sharing knowledge to build rich, sustainable lives.

Blog: chriswaterguy.livejournal.com/


Aiming for emails of 5 sentences or less - http://five.sentenc.es/

'They demanded bread and their method of making their protest was to burn
down the bakery. - Ortega Y Gasset

Buying at Amazon, eBay etc? Start at http://appropedia.maatiam.com and
support Appropedia - at no extra cost.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-discuss/attachments/20080917/d2cac7d6/attachment-0001.html>


More information about the okfn-discuss mailing list