[ckan-changes] commit/ckanext-harvest: 3 new changesets

Bitbucket commits-noreply at bitbucket.org
Mon Jul 18 18:56:57 UTC 2011


3 new changesets in ckanext-harvest:

http://bitbucket.org/okfn/ckanext-harvest/changeset/ab9ca433f9dc/
changeset:   ab9ca433f9dc
user:        amercader
date:        2011-07-18 18:34:24
summary:     Update README. Thanks to Rolf Kleef for the patch
affected #:  1 file (1.3 KB)

--- a/README.rst	Tue Jun 28 15:04:40 2011 +0100
+++ b/README.rst	Mon Jul 18 17:34:24 2011 +0100
@@ -5,7 +5,7 @@
 This extension provides a common harvesting framework for ckan extensions
 and adds a CLI and a WUI to CKAN to manage harvesting sources and jobs.
 
-Dependencies
+Installation
 ============
 
 The harvest extension uses Message Queuing to handle the different gather
@@ -15,9 +15,20 @@
 
     sudo apt-get install rabbitmq-server
 
-The extension uses `carrot` as messaging library::
+Clone the repository and set up the extension
 
-    http://ask.github.com/carrot/
+	hg clone https://bitbucket.org/okfn/ckanext-harvest
+
+	cd ckanext-harvest
+
+    pip install -r pip-requirements.txt
+
+	python setup.py develop
+
+Make sure the configuration ini file contains the harvest main plugin, as
+well as the harvester for CKAN instances (included with the extension)
+
+	ckan.plugins = harvest ckan_harvester
 
 
 Configuration
@@ -36,6 +47,11 @@
 
     paster sysadmin add harvest
 
+After installation, the harvest interface should be available under /harvest
+if you're logged in with sysadmin permissions, eg.
+
+	http://localhost:5000/harvest
+
 Tests
 =====
 
@@ -84,6 +100,13 @@
       harvester fetch_consumer
         - starts the consumer for the fetching queue
 
+      harvester import [{source-id}]
+        - perform the import stage with the last fetched objects, optionally
+          belonging to a certain source.
+          Please note that no objects will be fetched from the remote server.
+          It will only affect the last fetched objects already present in the
+          database.
+
 The commands should be run from the ckanext-harvest directory and expect
 a development.ini file to be present. Most of the time you will specify
 the config explicitly though::
@@ -93,7 +116,12 @@
 The CKAN haverster
 ==================
 
-TODO
+The plugin includes a harvester for remote CKAN instances. To use it, you need
+to add the `ckan_harvester` plugin to your options file:
+
+	ckan.plugins = harvest ckan_harvester
+
+After adding it, a 'CKAN' option should appear in the 'New harvest source' form.
 
 
 The harvesting interface
@@ -213,10 +241,14 @@
         :returns: True if everything went right, False if errors were found
         '''
 
-See ckanext-inspire for a an example on how to implement the harvesting
+See the CKAN harvester for a an example on how to implement the harvesting
 interface:
 
-    https://bitbucket.org/okfn/ckanext-inspire/src/
+    ckanext-harvest/ckanext/harvest/harvesters/ckanharvester.py
+
+Here you can also find other examples of custom harvesters:
+
+    https://bitbucket.org/okfn/ckanext-pdeu/src/213d3fe4c36e/ckanext/pdeu/harvesters/
 
 
 Running the harvest jobs
@@ -237,3 +269,8 @@
 pending harvesting jobs::
 
       paster harvester run --config=../ckan/development.ini
+      
+After packages have been imported, the search index will have to be updated
+before the packages appear in search results (from the ckan directory):
+
+      paster search-index


http://bitbucket.org/okfn/ckanext-harvest/changeset/336acf7dbca6/
changeset:   336acf7dbca6
user:        amercader
date:        2011-07-18 18:35:03
summary:     Add docs to base harvester functions
affected #:  1 file (520 bytes)

--- a/ckanext/harvest/harvesters/base.py	Mon Jul 18 17:34:24 2011 +0100
+++ b/ckanext/harvest/harvesters/base.py	Mon Jul 18 17:35:03 2011 +0100
@@ -21,17 +21,24 @@
 
 class HarvesterBase(SingletonPlugin):
     '''
-    Generic class for publicdata.eu harvesters
+    Generic class for  harvesters with helper functions
     '''
     implements(IHarvester)
 
     def _gen_new_name(self,title):
+        '''
+        Creates a URL friendly name from a title
+        '''
         name = munge_title_to_name(title).replace('_', '-')
         while '--' in name:
             name = name.replace('--', '-')
         return name
 
     def _check_name(self,name):
+        '''
+        Checks if a package name already exists in the database, and adds
+        a counter at the end if it does exist.
+        '''
         like_q = u'%s%%' % name
         pkg_query = Session.query(Package).filter(Package.name.ilike(like_q)).limit(100)
         taken = [pkg.name for pkg in pkg_query]
@@ -46,16 +53,26 @@
             return None
 
     def _save_gather_error(self,message,job):
+        '''
+        Helper function to create an error during the gather stage.
+        '''
         err = HarvestGatherError(message=message,job=job)
         err.save()
         log.error(message)
 
     def _save_object_error(self,message,obj,stage=u'Fetch'):
+        '''
+        Helper function to create an error during the fetch or import stage.
+        '''
         err = HarvestObjectError(message=message,object=obj,stage=stage)
         err.save()
         log.error(message)
 
     def _create_harvest_objects(self, remote_ids, harvest_job):
+        '''
+        Given a list of remote ids and a Harvest Job, create as many Harvest Objects and
+        return a list of its ids to be returned to the fetch stage.
+        '''
         try:
             object_ids = []
             if len(remote_ids):
@@ -87,9 +104,7 @@
 
         '''
         try:
-            #from pprint import pprint 
-            #pprint(package_dict)
-            ## change default schema
+            # Change default schema
             schema = default_package_schema()
             schema["id"] = [ignore_missing, unicode]
 
@@ -144,9 +159,3 @@
             self._save_object_error('%r'%e,harvest_object,'Import')
 
         return None
-
-
-
-
-
-


http://bitbucket.org/okfn/ckanext-harvest/changeset/89ecca73e087/
changeset:   89ecca73e087
user:        amercader
date:        2011-07-18 18:35:32
summary:     Use API version defined in config if present
affected #:  1 file (69 bytes)

--- a/ckanext/harvest/harvesters/ckanharvester.py	Mon Jul 18 17:35:03 2011 +0100
+++ b/ckanext/harvest/harvesters/ckanharvester.py	Mon Jul 18 17:35:32 2011 +0100
@@ -20,7 +20,6 @@
     '''
     config = None
 
-    #TODO: check different API versions
     api_version = '2'
 
     def _get_rest_api_offset(self):
@@ -44,6 +43,10 @@
     def _set_config(self,config_str):
         if config_str:
             self.config = json.loads(config_str)
+
+            if 'api_version' in self.config:
+                self.api_version = self.config['api_version']
+
             log.debug('Using config: %r', self.config)
         else:
             self.config = {}

Repository URL: https://bitbucket.org/okfn/ckanext-harvest/

--

This is a commit notification from bitbucket.org. You are receiving
this because you have the service enabled, addressing the recipient of
this email.




More information about the ckan-changes mailing list