[ckan-changes] commit/ckanext-harvest: 3 new changesets
Bitbucket
commits-noreply at bitbucket.org
Mon Jul 18 18:56:57 UTC 2011
3 new changesets in ckanext-harvest:
http://bitbucket.org/okfn/ckanext-harvest/changeset/ab9ca433f9dc/
changeset: ab9ca433f9dc
user: amercader
date: 2011-07-18 18:34:24
summary: Update README. Thanks to Rolf Kleef for the patch
affected #: 1 file (1.3 KB)
--- a/README.rst Tue Jun 28 15:04:40 2011 +0100
+++ b/README.rst Mon Jul 18 17:34:24 2011 +0100
@@ -5,7 +5,7 @@
This extension provides a common harvesting framework for ckan extensions
and adds a CLI and a WUI to CKAN to manage harvesting sources and jobs.
-Dependencies
+Installation
============
The harvest extension uses Message Queuing to handle the different gather
@@ -15,9 +15,20 @@
sudo apt-get install rabbitmq-server
-The extension uses `carrot` as messaging library::
+Clone the repository and set up the extension
- http://ask.github.com/carrot/
+ hg clone https://bitbucket.org/okfn/ckanext-harvest
+
+ cd ckanext-harvest
+
+ pip install -r pip-requirements.txt
+
+ python setup.py develop
+
+Make sure the configuration ini file contains the harvest main plugin, as
+well as the harvester for CKAN instances (included with the extension)
+
+ ckan.plugins = harvest ckan_harvester
Configuration
@@ -36,6 +47,11 @@
paster sysadmin add harvest
+After installation, the harvest interface should be available under /harvest
+if you're logged in with sysadmin permissions, eg.
+
+ http://localhost:5000/harvest
+
Tests
=====
@@ -84,6 +100,13 @@
harvester fetch_consumer
- starts the consumer for the fetching queue
+ harvester import [{source-id}]
+ - perform the import stage with the last fetched objects, optionally
+ belonging to a certain source.
+ Please note that no objects will be fetched from the remote server.
+ It will only affect the last fetched objects already present in the
+ database.
+
The commands should be run from the ckanext-harvest directory and expect
a development.ini file to be present. Most of the time you will specify
the config explicitly though::
@@ -93,7 +116,12 @@
The CKAN haverster
==================
-TODO
+The plugin includes a harvester for remote CKAN instances. To use it, you need
+to add the `ckan_harvester` plugin to your options file:
+
+ ckan.plugins = harvest ckan_harvester
+
+After adding it, a 'CKAN' option should appear in the 'New harvest source' form.
The harvesting interface
@@ -213,10 +241,14 @@
:returns: True if everything went right, False if errors were found
'''
-See ckanext-inspire for a an example on how to implement the harvesting
+See the CKAN harvester for a an example on how to implement the harvesting
interface:
- https://bitbucket.org/okfn/ckanext-inspire/src/
+ ckanext-harvest/ckanext/harvest/harvesters/ckanharvester.py
+
+Here you can also find other examples of custom harvesters:
+
+ https://bitbucket.org/okfn/ckanext-pdeu/src/213d3fe4c36e/ckanext/pdeu/harvesters/
Running the harvest jobs
@@ -237,3 +269,8 @@
pending harvesting jobs::
paster harvester run --config=../ckan/development.ini
+
+After packages have been imported, the search index will have to be updated
+before the packages appear in search results (from the ckan directory):
+
+ paster search-index
http://bitbucket.org/okfn/ckanext-harvest/changeset/336acf7dbca6/
changeset: 336acf7dbca6
user: amercader
date: 2011-07-18 18:35:03
summary: Add docs to base harvester functions
affected #: 1 file (520 bytes)
--- a/ckanext/harvest/harvesters/base.py Mon Jul 18 17:34:24 2011 +0100
+++ b/ckanext/harvest/harvesters/base.py Mon Jul 18 17:35:03 2011 +0100
@@ -21,17 +21,24 @@
class HarvesterBase(SingletonPlugin):
'''
- Generic class for publicdata.eu harvesters
+ Generic class for harvesters with helper functions
'''
implements(IHarvester)
def _gen_new_name(self,title):
+ '''
+ Creates a URL friendly name from a title
+ '''
name = munge_title_to_name(title).replace('_', '-')
while '--' in name:
name = name.replace('--', '-')
return name
def _check_name(self,name):
+ '''
+ Checks if a package name already exists in the database, and adds
+ a counter at the end if it does exist.
+ '''
like_q = u'%s%%' % name
pkg_query = Session.query(Package).filter(Package.name.ilike(like_q)).limit(100)
taken = [pkg.name for pkg in pkg_query]
@@ -46,16 +53,26 @@
return None
def _save_gather_error(self,message,job):
+ '''
+ Helper function to create an error during the gather stage.
+ '''
err = HarvestGatherError(message=message,job=job)
err.save()
log.error(message)
def _save_object_error(self,message,obj,stage=u'Fetch'):
+ '''
+ Helper function to create an error during the fetch or import stage.
+ '''
err = HarvestObjectError(message=message,object=obj,stage=stage)
err.save()
log.error(message)
def _create_harvest_objects(self, remote_ids, harvest_job):
+ '''
+ Given a list of remote ids and a Harvest Job, create as many Harvest Objects and
+ return a list of its ids to be returned to the fetch stage.
+ '''
try:
object_ids = []
if len(remote_ids):
@@ -87,9 +104,7 @@
'''
try:
- #from pprint import pprint
- #pprint(package_dict)
- ## change default schema
+ # Change default schema
schema = default_package_schema()
schema["id"] = [ignore_missing, unicode]
@@ -144,9 +159,3 @@
self._save_object_error('%r'%e,harvest_object,'Import')
return None
-
-
-
-
-
-
http://bitbucket.org/okfn/ckanext-harvest/changeset/89ecca73e087/
changeset: 89ecca73e087
user: amercader
date: 2011-07-18 18:35:32
summary: Use API version defined in config if present
affected #: 1 file (69 bytes)
--- a/ckanext/harvest/harvesters/ckanharvester.py Mon Jul 18 17:35:03 2011 +0100
+++ b/ckanext/harvest/harvesters/ckanharvester.py Mon Jul 18 17:35:32 2011 +0100
@@ -20,7 +20,6 @@
'''
config = None
- #TODO: check different API versions
api_version = '2'
def _get_rest_api_offset(self):
@@ -44,6 +43,10 @@
def _set_config(self,config_str):
if config_str:
self.config = json.loads(config_str)
+
+ if 'api_version' in self.config:
+ self.api_version = self.config['api_version']
+
log.debug('Using config: %r', self.config)
else:
self.config = {}
Repository URL: https://bitbucket.org/okfn/ckanext-harvest/
--
This is a commit notification from bitbucket.org. You are receiving
this because you have the service enabled, addressing the recipient of
this email.
More information about the ckan-changes
mailing list