[ckan-dev] Datapusher can't upload to DataStore: JobError bad response on resource_show

Florian May florian.wendelin.mayer at gmail.com
Thu Nov 20 10:47:10 UTC 2014


After some soul-searching we tracked down the error to be AWS proxy
settings, which were turning back datapusher's (perfectly fine) HTTP
requests to the (perfectly working) CKAN API's resource_show with an "403
Forbidden (nginx)".

Once the offending request was tracked down, the money shot was to send the
request via curl from the same server vs. a different machine, and to the
fully resolved hostname (set by our own reverse proxy, pointing to AWS's
proxy, pointing to the VM) vs. the local AWS machine's name. A local
request to the local machine's name (which didn't leave the machine)
worked, whereas any request that went outside the VM and tried to come back
in to access the CKAN API inside the same VM got 403'd.

Confusingly, sending requests from the browser using a REST plugin worked
fine (again, that was the AWS proxy configuration doing its job).

The CKAN production deployment docs could include a section about testing
setup of the reverse proxy and hostnames using curl requests, much like the
datapusher's docs.

Hope that's helpful if anyone gets stuck in the same place!

Cheers,
Florian
Dept Parks & Wildlife WA


On Wed, Nov 19, 2014 at 5:33 PM, Florian May <
florian.wendelin.mayer at gmail.com> wrote:

> Hi all,
>
> I'm having some grief installing the datapusher, I'm getting a "403" from
> the CKAN API when the datapusher tries to get details about a resource by
> calling CKAN API's resource_show.
> I'm getting "*Error:* Process completed but unable to post to result_url"
>
>
> My environment is a Ubuntu 14.04 with a CKAN 2.3a master, last good
> version before the repoze upgrade, Nov 12, 2014 (
> https://github.com/ckan/ckan/tree/424bd37b806ebf7eaf8cbd83cdc0575798921aff
> )
>
> As I installed from source, I've got a separate datapusher (branch master)
>
> https://github.com/ckan/datapusher/tree/78f309c6753d26f6c94200b4d4cabeb5de7b1fea
>
> I'm running two CKANs with two separate sets of dbs and configs out of the
> same code repo (ports 5000 and 5001), plus the datapusher (port 8800)
> through Apache2 (2.4.7) mod_wsgi.
> I've set the "Require all granted" directive in all Virtualhost sections
> and the reverse proxy module
> <IfModule mod_rpaf.c>
> RPAFenable On
> RPAFsethostname On
> RPAFproxy_ips 127.0.0.1
> </IfModule>
> in the virtualhost section of CKAN's apache configs.
>
> All configs are vanilla as per the "installation from source"
> instructions, only my installation lives in /mnt/ckan instead of
> /usr/lib/ckan and paths are modified to that location.
>
> Datapusher responds to a curl 0.0.0.0:8800 with a polite
> {  "help": "\n        Get help at:\n
> http://ckan-service-provider.readthedocs.org/."}
>
> However, when I try to upload a CSV resource to the DataStore through the
> GUI (Manage resource, tab DataStore), I'm running into a permission error:
>
> ==> /var/log/apache2/datapusher.error.log <==
> [Wed Nov 19 17:03:59.810043 2014] [:error] [pid 6722] Job
> "push_to_datastore (trigger: RunTriggerNow, run = True, next run at: None)"
> raised an exception
> [Wed Nov 19 17:03:59.810083 2014] [:error] [pid 6722] Traceback (most
> recent call last):
> [Wed Nov 19 17:03:59.810089 2014] [:error] [pid 6722]   File
> "/mnt/ckan/datapusher/lib/python2.7/site-packages/apscheduler/scheduler.py",
> line 512, in _run_job
> [Wed Nov 19 17:03:59.810093 2014] [:error] [pid 6722]     retval =
> job.func(*job.args, **job.kwargs)
> [Wed Nov 19 17:03:59.810096 2014] [:error] [pid 6722]   File
> "/mnt/ckan/datapusher/src/datapusher/datapusher/jobs.py", line 226, in
> push_to_datastore
> [Wed Nov 19 17:03:59.810099 2014] [:error] [pid 6722]     resource =
> get_resource(resource_id, ckan_url, api_key)
> [Wed Nov 19 17:03:59.810102 2014] [:error] [pid 6722]   File
> "/mnt/ckan/datapusher/src/datapusher/datapusher/jobs.py", line 180, in
> get_resource
> [Wed Nov 19 17:03:59.810106 2014] [:error] [pid 6722]
> check_response(r, url, 'CKAN')
> [Wed Nov 19 17:03:59.810109 2014] [:error] [pid 6722]   File
> "/mnt/ckan/datapusher/src/datapusher/datapusher/jobs.py", line 91, in
> check_response
> [Wed Nov 19 17:03:59.810112 2014] [:error] [pid 6722]
> resp=response.text[:200]))
> [Wed Nov 19 17:03:59.810115 2014] [:error] [pid 6722] JobError: CKAN bad
> response. Status code: 403 Forbidden. At:
> http://ckan-private.dpaw.wa.gov.au/api/3/action/resource_show. Response:
> <html>\r
> [Wed Nov 19 17:03:59.810119 2014] [:error] [pid 6722] <head><title>403
> Forbidden</title></head>\r
> [Wed Nov 19 17:03:59.810122 2014] [:error] [pid 6722] <body
> bgcolor="white">\r
> [Wed Nov 19 17:03:59.810126 2014] [:error] [pid 6722] <center><h1>403
> Forbidden</h1></center>\r
> [Wed Nov 19 17:03:59.810132 2014] [:error] [pid 6722]
> <hr><center>nginx</center>\r
> [Wed Nov 19 17:03:59.810135 2014] [:error] [pid 6722] </body>\r
> [Wed Nov 19 17:03:59.810138 2014] [:error] [pid 6722] </html>\r
>
> So get_resource in datastorer's jobs.py tries to send a POST to
> resource_show including myh API key, and gets 403'd by the CKAN API.
> I've changed that to a GET request to
> http://MYCKAN/api/3/action/resource_show?id=RESOURCE_ID to the same
> effect.
>
> I don't understand why first of all this needs to be a POST request, where
> CKAN's resource_show is a GETable function and doesn't need an API key.
> Secondly, I can send
> http://MYCKAN/api/3/action/resource_show?id=RESOURCE_ID with appropriate
> server name and resource id through my browser and get the correct JSON
> response back.
> Why is authentication necessary?
>
> Could anyone help me understand the difference between:
> - http://MYCKAN/api/3/action/resource_show?id=RESOURCE_ID sent through a
> browser, received by Apache's wsgi and presumably executed as Apache's
> system user (www-data) calling ckan's API, which works fine, and
> - sending the same request through a REST plugin as both GET and POST
> (guess what - works fine), and
> - the same request sent from datapusher's wsgi app as an HTML POST request
> to the full server name (how far "outside" does that request go in terms of
> firewall and apache's reverse proxy module?), which will receive a "403
> Forbidden"?
>
> How is datastorer's POST different to my browser's POST?
> Did anyone else get the latest CKAN to work with the datapusher as a
> source install?
>
> Cheers,
> Florian
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20141120/8ec8b6e0/attachment-0003.html>


More information about the ckan-dev mailing list