[ckan-discuss] Harvester questions/issues

Stéphane Guidoin stephane at opennorth.ca
Thu Oct 31 01:48:13 GMT 2013


Thank you Adrià!

It works indeed better with Redis! Maybe the installation section of the
CKANEXT-HARVEST should mention to try redis first (or at least mention that
rabbitMQ does not always work).

In my case, if it can help to investigate: I am using ubuntu 13.04, python
2.7, Ckan 2.2 and  used the ubuntu/debian packaged rabbitMQ (apt-get
install rabbitmq-server).

--

Other question for you, folks: so the harvester do not copy the resource on
the local CKAN instance. I just points the resource where it is. In our
case, we were expecting to connect our server to an *internal* CSW server
(so resources of that server would not be available for the public). So we
need CKAN to copy the resources of the CSW locally. I guess we will have to
do some code changes to do that file copy. Am I right?

Stéphane



On Wed, Oct 30, 2013 at 6:25 AM, Adrià Mercader <adria.mercader at okfn.org>wrote:

> Hi Stéphane,
>
> I recommend clearing the source, restart all the consumers and try
> again. Clearing the source needs to be done via the web interface (in
> the Admin section of the source), and will delete all jobs, datasets
> etc for a particular source (The purge_queue command just removes
> items from the actual queue).
>
> You should see straight away some logging in the gather consumer
> console after running the run command if everything went fine.
>
> If problems persist it might we worth checking if switching to Redis
> solves the issue, we've found it to be more reliable.
>
> For CKAN harvesters you need to set the root CKAN URL (eg
> http://demo.ckan.org), and for CSW servers it shouldn't matter if you
> point to the root of the server or the GetCapabiliites request.
>
> Let us know how it works for you,
>
> Adrià
>
>
>
>
> On 29 October 2013 16:18, Stéphane Guidoin <stephane at opennorth.ca> wrote:
> > Hello,
> >
> > I am currently evaluating how to automate the maintenance of datasets
> for a
> > CKAN implementation (City of Montréal).
> >
> > I am looking at the harvester and I am not able to have it working.
> >
> > So I installed both CKAN and CSW harvesters following the only guide
> > (https://github.com/okfn/ckanext-harvest) using RabbitMQ.
> >
> > I am able to access http://my_instance:5000/harvest, the harvester CLI
> basic
> > commands work (I am able to add sources, list them, etc.)
> >
> > In order to run a manual job, I did the following:
> >
> > Add the source (CSW):
> > paster --plugin=ckanext-harvest harvester source mysource
> > "
> http://www.donnees.gouv.qc.ca/geonetwork/srv/eng/csw?SERVICE=CSW&request=GetCapabilities&AcceptVersion=2.0.2
> "
> > csw "My Source" --config=/etc/ckan/default/production.ini
> >
> > Add a job for this source
> > paster --plugin=ckanext-harvest harvester job
> > 01aa2038-678b-4ce5-972e-ad4eaf9198f6
> > --config=/etc/ckan/default/production.ini
> >
> > (If I ask for the list of jobs, this one appears as "new")
> >
> > Then, in different consoles, I start the gather and the fetch:
> > paster --plugin=ckanext-harvest harvester gather_consumer
> > --config=/etc/ckan/default/production.ini
> > paster --plugin=ckanext-harvest harvester fetch_consumer
> > --config=/etc/ckan/default/production.ini
> >
> > And in a third console, I launch the run command:
> > paster --plugin=ckanext-harvest harvester run
> > --config=/etc/ckan/default/production.ini
> >
> > After this, I receive a message telling me the job has been sent to the
> > gather queue
> > 2013-10-28 14:13:40,550 INFO  [ckanext.harvest.logic.action.update] Sent
> job
> > 97bd1013-9160-4359-9b5a-243fbd8bc30a to the gather queue
> >
> > But nothing happen, the gather consumer does nothing, no process runs
> even
> > if I leave this running for hours. The job appears as running.
> >
> > If I try to delete the job (purge_queue), the job remains there running,
> if
> > I add the job again, it tells me there is already a job for that source,
> but
> > if I try to start the "run" again, it tells me there no new job.
> >
> > How is it suppposed to behave? Did I miss something? Any idea if/where I
> > could find relevant logs (RabbitMQ's log in /var/log/rabbitmq) does not
> show
> > anything...
> >
> > Slightly related question: what is the URL that should be configurer for
> > bpth CKAN and CSW harvester? For CKAN should I link the CKAN root page or
> > directly the API. For CSW, do I point the basic CSW resource of the
> > "GetCapacilities"?
> >
> > Thank you
> >
> > Steph
> >
> >
> > _______________________________________________
> > ckan-discuss mailing list
> > ckan-discuss at lists.okfn.org
> > http://lists.okfn.org/mailman/listinfo/ckan-discuss
> > Unsubscribe: http://lists.okfn.org/mailman/options/ckan-discuss
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-discuss/attachments/20131030/0559eef0/attachment-0001.htm>


More information about the ckan-discuss mailing list