[ckan-dev] Datapusher can't upload to DataStore: JobError bad response on resource_show

Florian May florian.wendelin.mayer at gmail.com
Wed Nov 19 09:33:01 UTC 2014


Hi all,

I'm having some grief installing the datapusher, I'm getting a "403" from
the CKAN API when the datapusher tries to get details about a resource by
calling CKAN API's resource_show.
I'm getting "*Error:* Process completed but unable to post to result_url"


My environment is a Ubuntu 14.04 with a CKAN 2.3a master, last good version
before the repoze upgrade, Nov 12, 2014 (
https://github.com/ckan/ckan/tree/424bd37b806ebf7eaf8cbd83cdc0575798921aff)

As I installed from source, I've got a separate datapusher (branch master)
https://github.com/ckan/datapusher/tree/78f309c6753d26f6c94200b4d4cabeb5de7b1fea

I'm running two CKANs with two separate sets of dbs and configs out of the
same code repo (ports 5000 and 5001), plus the datapusher (port 8800)
through Apache2 (2.4.7) mod_wsgi.
I've set the "Require all granted" directive in all Virtualhost sections
and the reverse proxy module
<IfModule mod_rpaf.c>
RPAFenable On
RPAFsethostname On
RPAFproxy_ips 127.0.0.1
</IfModule>
in the virtualhost section of CKAN's apache configs.

All configs are vanilla as per the "installation from source" instructions,
only my installation lives in /mnt/ckan instead of /usr/lib/ckan and paths
are modified to that location.

Datapusher responds to a curl 0.0.0.0:8800 with a polite
{  "help": "\n        Get help at:\n
http://ckan-service-provider.readthedocs.org/."}

However, when I try to upload a CSV resource to the DataStore through the
GUI (Manage resource, tab DataStore), I'm running into a permission error:

==> /var/log/apache2/datapusher.error.log <==
[Wed Nov 19 17:03:59.810043 2014] [:error] [pid 6722] Job
"push_to_datastore (trigger: RunTriggerNow, run = True, next run at: None)"
raised an exception
[Wed Nov 19 17:03:59.810083 2014] [:error] [pid 6722] Traceback (most
recent call last):
[Wed Nov 19 17:03:59.810089 2014] [:error] [pid 6722]   File
"/mnt/ckan/datapusher/lib/python2.7/site-packages/apscheduler/scheduler.py",
line 512, in _run_job
[Wed Nov 19 17:03:59.810093 2014] [:error] [pid 6722]     retval =
job.func(*job.args, **job.kwargs)
[Wed Nov 19 17:03:59.810096 2014] [:error] [pid 6722]   File
"/mnt/ckan/datapusher/src/datapusher/datapusher/jobs.py", line 226, in
push_to_datastore
[Wed Nov 19 17:03:59.810099 2014] [:error] [pid 6722]     resource =
get_resource(resource_id, ckan_url, api_key)
[Wed Nov 19 17:03:59.810102 2014] [:error] [pid 6722]   File
"/mnt/ckan/datapusher/src/datapusher/datapusher/jobs.py", line 180, in
get_resource
[Wed Nov 19 17:03:59.810106 2014] [:error] [pid 6722]     check_response(r,
url, 'CKAN')
[Wed Nov 19 17:03:59.810109 2014] [:error] [pid 6722]   File
"/mnt/ckan/datapusher/src/datapusher/datapusher/jobs.py", line 91, in
check_response
[Wed Nov 19 17:03:59.810112 2014] [:error] [pid 6722]
resp=response.text[:200]))
[Wed Nov 19 17:03:59.810115 2014] [:error] [pid 6722] JobError: CKAN bad
response. Status code: 403 Forbidden. At:
http://ckan-private.dpaw.wa.gov.au/api/3/action/resource_show. Response:
<html>\r
[Wed Nov 19 17:03:59.810119 2014] [:error] [pid 6722] <head><title>403
Forbidden</title></head>\r
[Wed Nov 19 17:03:59.810122 2014] [:error] [pid 6722] <body
bgcolor="white">\r
[Wed Nov 19 17:03:59.810126 2014] [:error] [pid 6722] <center><h1>403
Forbidden</h1></center>\r
[Wed Nov 19 17:03:59.810132 2014] [:error] [pid 6722]
<hr><center>nginx</center>\r
[Wed Nov 19 17:03:59.810135 2014] [:error] [pid 6722] </body>\r
[Wed Nov 19 17:03:59.810138 2014] [:error] [pid 6722] </html>\r

So get_resource in datastorer's jobs.py tries to send a POST to
resource_show including myh API key, and gets 403'd by the CKAN API.
I've changed that to a GET request to
http://MYCKAN/api/3/action/resource_show?id=RESOURCE_ID to the same effect.

I don't understand why first of all this needs to be a POST request, where
CKAN's resource_show is a GETable function and doesn't need an API key.
Secondly, I can send http://MYCKAN/api/3/action/resource_show?id=RESOURCE_ID
with appropriate server name and resource id through my browser and get the
correct JSON response back.
Why is authentication necessary?

Could anyone help me understand the difference between:
- http://MYCKAN/api/3/action/resource_show?id=RESOURCE_ID sent through a
browser, received by Apache's wsgi and presumably executed as Apache's
system user (www-data) calling ckan's API, which works fine, and
- sending the same request through a REST plugin as both GET and POST
(guess what - works fine), and
- the same request sent from datapusher's wsgi app as an HTML POST request
to the full server name (how far "outside" does that request go in terms of
firewall and apache's reverse proxy module?), which will receive a "403
Forbidden"?

How is datastorer's POST different to my browser's POST?
Did anyone else get the latest CKAN to work with the datapusher as a source
install?

Cheers,
Florian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20141119/26ed4042/attachment-0002.html>


More information about the ckan-dev mailing list