[ckan-dev] harvesting user and deleting a user, PostgreSQL version ?

Hildegard Gerlach hildegard.gerlach at jrc.ec.europa.eu
Wed Nov 27 08:52:01 UTC 2013


Dear Joe,

thanks for answering.
I have set the branch of ckanext-harvest to stable
git branch
   master
* stable


I have done an upgrade of CKAN to 2.1.1 some days ago and also the 
extensions (stable), but the error was there before and nothing changed.

I don't know what git  show (in ckanext-harvest) should do, I get the 
following:
git show
commit cb724b81ead8529bf109b4df0f8e348cf7b835b0
Author: amercader <amercadero at gmail.com>
Date:   Thu Oct 24 12:33:44 2013 +0100

     Improve organizations dropdown on source form

diff --git a/ckanext/harvest/templates_new/source/new_source_form.html 
b/ckanext/harvest/templates_new/source/new_source_form.html
index 8f79c34..5e3f4c9 100644
--- a/ckanext/harvest/templates_new/source/new_source_form.html
+++ b/ckanext/harvest/templates_new/source/new_source_form.html
@@ -50,24 +50,28 @@
    {% if data.group_id %}
      <input type="hidden" name="groups__0__id" value="{{ data.group_id 
}}" />
    {% endif %}
-  {% set existing_org = data.owner_org or data.group_id %}
-  {% if h.check_access('sysadmin') or data.get('state', 
'draft').startswith('draft') or data.get('state', 'none') == 'none' %}
-    {% set organizations_available = 
h.organizations_available('create_dataset') %}
-    {% if organizations_available %}
-      <div class="control-group">
-        <label for="field-organizations" class="control-label">{{ 
_('Organization') }}</label>
-        <div class="controls">
-          <select id="field-organizations" name="owner_org" 
data-module="autocomplete">
-            <option value="">{{ _('Select an organization...') }}</option>
-            {% for organization in organizations_available %}
-              {# get out first org from users list only if there is not 
an existing org #}
-              {% set selected_org = (existing_org and existing_org == 
organization.id) or (not existing_org and organization.id == 
organizations_available[
-              <option value="{{ organization.id }}" {% if selected_org 
%} selected="selected" {% endif %}>{{ organization.name }}</option>
-            {% endfor %}
-          </select>
-        </div>
+
+  {% set dataset_is_draft = data.get('state', 
'draft').startswith('draft') or data.get('state', 'none') == 'none' %}
+  {% set dataset_has_organization = data.owner_org or data.group_id %}
+  {% set organizations_available = 
h.organizations_available('create_dataset') %}
+  {% set user_is_sysadmin = h.check_access('sysadmin') %}
+  {% set show_organizations_selector = organizations_available and 
(user_is_sysadmin or dataset_is_draft) %}
+
+  {% if show_organizations_selector %}
+    {% set existing_org = data.owner_org %}
+    <div class="control-group">
+      <label for="field-organizations" class="control-label">{{ 
_('Organization') }}</label>
+      <div class="controls">

paster harvester purge_queues  didn't change anything either.

I really think the user is wrong (actually it seems to be empty) but I don't understand why. In the interface I use the user admin, and with the paster command I don't set any user (in ckan virtualenv) when using the paster command and in the ckan configuration file.

I have added the following code to the file update.py which gives the 
error message
File 
"/usr/local/ckan/pyenv/src/ckanext-harvest/ckanext/harvest/logic/auth/update.py", 
line 28, in harvest_source_update
     raise pt.ObjectNotFound(pt._('Harvest source not found'))
ckan.logic.NotFound: Harvest source not found

     f = open('/usr/local/ckan/Hildetest/Hilde', 'w')
     f.write("user "  + user)
     f.write(os.linesep)
     f.write(pt._('User {0} blabla {1}').format(user, source_id))
     f.close()

and the file Hilde created  there contains

more Hildetest/Hilde
user
User  blabla None

I have already written in the earlier posts of this thread that we have 
this mysterious empty user and cannot delete it.

paster --plugin=ckan user list -c /etc/ckan/production.ini
Users:
count = 4
name=
name=admin
name=logged_in
name=visitor

When I did initdb for ckan (not the extensions) with an empty database 
this user was created.
If  I look at the sysadmin users in the administration interface, I see
/user/640f0c2d-4005-4bab-b326-c21ae9c491c3 
<http://drdsi-data.ies.jrc.it/user/640f0c2d-4005-4bab-b326-c21ae9c491c3>
admin

I don't know what the first one means.

On the second machine which had been installed earlier (always Redhat) 
this empty user doesn't appear.

Could there be an issue using PostgreSQL 9.2 ?

Hilde


On 11/27/2013 8:50 AM, Joe Tsoi wrote:
> Hi,
>
> The harvest source not found error looks like it might not be the
> harvest user as it could be bailing out before it gets to the point
> where it uses it.
>
> I think there are two seperate things that might, help. Either you
> harvest extension isn't up to date, could you do a git show on
> ckanext-harvest directory? Just so I can know which version of the
> code we're dealing with here.
>
> Or you've got jobs left over from a previous harvest job before you
> wiped the database. In which case could you do a paster harvester
> purge_queues and then start again with the empty database
>
>
>
> On 26 November 2013 17:45, Hildegard Gerlach
> <hildegard.gerlach at jrc.ec.europa.eu> wrote:
>> Hi again,
>>
>> I did another test. I created a new database, initialized it for ckan, the
>> harvest and spatial extension and created the sysadmin user admin.
>> Anyway, an empty user (name '') is created again when initializing the
>> database for CKAN (version 2.1.1)
>> When I try to harvest I get the same error
>>
>>    File
>> "/usr/local/ckan/pyenv/src/ckanext-harvest/ckanext/harvest/logic/auth/update.py",
>> line 28, in harvest_source_update
>>      raise pt.ObjectNotFound(pt._('Harvest source not found'))
>> ckan.logic.NotFound: Harvest source not found
>>
>> and in the logfile I have the following:
>>
>> 2013-11-26 16:19:35,913 DEBUG [ckanext.harvest.model] Harvest tables defined
>> in memory
>> 2013-11-26 16:19:35,917 DEBUG [ckanext.harvest.model] Harvest tables already
>> exist
>> 2013-11-26 16:19:35,938 DEBUG [ckanext.spatial.model.package_extent] Spatial
>> tables defined in memory
>> 2013-11-26 16:19:35,944 DEBUG [ckanext.spatial.model.package_extent] Spatial
>> tables already exist
>> 2013-11-26 16:19:35,960 DEBUG [ckanext.harvest.queue] pika connection using
>> {'retry_delay': 2.0, 'frame_max': 10000, 'channel_max': 0, 'locale'  :
>> 'en_US', 'socket_timeout': 0.25, 'ssl': False, 'host': 'localhost',
>> 'ssl_options': {}, 'virtual_host': '/', 'heartbeat': 0, 'credentials':
>> <pika.credentials.PlainCredentials object at 0x3949a50>,
>> 'backpressure_detection': False, 'port': 5672, 'connection_attempts': 1}
>> 2013-11-26 16:19:37,008 DEBUG [ckanext.harvest.queue] Gather queue consumer
>> registered
>> 2013-11-26 16:20:17,507 DEBUG [ckanext.harvest.model] Harvest tables defined
>> in memory
>> 2013-11-26 16:20:17,511 DEBUG [ckanext.harvest.model] Harvest tables already
>> exist
>> 2013-11-26 16:20:17,532 DEBUG [ckanext.spatial.model.package_extent] Spatial
>> tables defined in memory
>> 2013-11-26 16:20:17,540 DEBUG [ckanext.spatial.model.package_extent] Spatial
>> tables already exist
>> 2013-11-26 16:20:17,555 DEBUG [ckanext.harvest.queue] pika connection using
>> {'retry_delay': 2.0, 'frame_max': 10000, 'channel_max': 0, 'locale' :
>> 'en_US', 'socket_timeout': 0.25, 'ssl': False, 'host': 'localhost',
>> 'ssl_options': {}, 'virtual_host': '/', 'heartbeat': 0, 'credentials':
>> <pika.credentials.PlainCredentials object at 0x39be990>,
>> 'backpressure_detection': False, 'port': 5672, 'connection_attempts': 1}
>> 2013-11-26 16:20:18,602 DEBUG [ckanext.harvest.queue] Fetch queue consumer
>> registered
>> 2013-11-26 16:20:32,927 DEBUG [ckanext.harvest.model] Harvest tables defined
>> in memory
>> 2013-11-26 16:20:32,931 DEBUG [ckanext.harvest.model] Harvest tables already
>> exist
>> 2013-11-26 16:20:32,952 DEBUG [ckanext.spatial.model.package_extent] Spatial
>> tables defined in memory
>> 2013-11-26 16:20:32,959 DEBUG [ckanext.spatial.model.package_extent] Spatial
>> tables already exist
>> 2013-11-26 16:20:32,974 INFO  [ckanext.harvest.logic.action.update] Harvest
>> job run: {}
>> 2013-11-26 16:21:03,016 WARNI [ckan.lib.maintain] Function nav_named_link()
>> in module ckan.lib.helpers has been deprecated and will be removed in a
>> later release of ckan. h.nav_named_link is deprecated please use h.nav_link
>> NOTE: you will need to pass the route_name as a named parameter
>>
>>
>> We are using solr as a backend
>> ckanext.spatial.search_backend = solr
>>
>> and have PostgreSQL 9.2 with PostGIS 2.1.0
>>
>> ckandb=> SELECT PostGIS_full_version();
>> postgis_full_version
>> ----------------------------------------------------------------------
>>   POSTGIS="2.1.0 r11822" GEOS="3.4.2-CAPI-1.8.2 r3921" PROJ="Rel. 4.8.0, 6
>> March 2012" GDAL="GDAL 1.9.2, released 2012/10/08"
>> LIBXML="2.7.6" LIBJSON="UNKNOWN" TOPOLOGY RASTER
>> (1 row)
>>
>> and the table package_extent is slightly different as to an older version of
>> PostgreSQL/PostGIS
>>
>> ckandb=> \d package_extent
>>            Table "public.package_extent"
>>     Column   |          Type           | Modifiers
>> ------------+-------------------------+-----------
>>   package_id | text                    | not null
>>   the_geom   | geometry(Geometry,4326) |
>> Indexes:
>>      "package_extent_pkey" PRIMARY KEY, btree (package_id)
>>      "idx_package_extent_the_geom" gist (the_geom)
>>
>>
>> I also cannot clean the database, I get the following error:
>>
>> paster db clean -c /etc/ckan/production.ini
>> /usr/local/ckan/pyenv/lib/python2.6/site-packages/sqlalchemy/engine/reflection.py:47:
>> SAWarning: Did not recognize type 'geometry(Geometry,4326)' of column
>> 'the_geom'
>>    ret = fn(self, con, *args, **kw)
>>
>> Is this version of PostgreSQL/PostGIS not yet supported by CKAN ?
>>
>> I really don't know what to try any more.
>>
>> Thanks
>>
>> Hilde
>>
>>
>>
>> On 26.11.2013, at 12:09, Hildegard Gerlach
>> <hildegard.gerlach at jrc.ec.europa.eu> wrote:
>>
>> Hi everyone,
>>
>>
>> http://docs.ckan.org/projects/ckanext-spatial/en/latest/harvesters.html says
>>
>> By default the harvesting actions (eg creating or updating datasets) will be
>> performed by the internal site admin user. This is the recommended setting,
>> but if necessary, it can be overridden with the
>> ckanext.spatial.harvest.user_name config option, eg to support the old
>> hardcoded ‘harvest’ user:
>>
>> ckanext.spatial.harvest.user_name  =  harvest
>>
>> We have another instance of CKAN (both on Redhat) on another machine where
>> the harvester works without having a user harvest.
>>
>> I am working with Elena and we have obviously an authorization problem
>> running the harvesting.
>>
>> paster --plugin=ckanext-harvest harvester run -c /etc/ckan/production.ini
>>
>> Traceback (most recent call last):
>> File "/usr/local/ckan/pyenv/bin/paster", line 9, in <module>
>> load_entry_point('PasteScript==1.7.5', 'console_scripts', 'paster')()
>> File
>> "/usr/local/ckan/pyenv/lib/python2.6/site-packages/paste/script/command.py",
>> line 104, in run
>> invoke(command, command_name, options, args[1:])
>> File
>> "/usr/local/ckan/pyenv/lib/python2.6/site-packages/paste/script/command.py",
>> line 143, in invoke
>> exit_code = runner.run(args)
>> File
>> "/usr/local/ckan/pyenv/lib/python2.6/site-packages/paste/script/command.py",
>> line 238, in run
>> result = self.command()
>> File
>> "/usr/local/ckan/pyenv/src/ckanext-harvest/ckanext/harvest/commands/harvester.py",
>> line 113, in command
>> self.run_harvester()
>> File
>> "/usr/local/ckan/pyenv/src/ckanext-harvest/ckanext/harvest/commands/harvester.py",
>> line 278, in run_harvester
>> jobs = get_action('harvest_jobs_run')(context,{})
>> File "/usr/local/ckan/pyenv/src/ckan/ckan/logic/__init__.py", line 329, in
>> wrapped
>> return _action(context, data_dict, **kw)
>> File
>> "/usr/local/ckan/pyenv/src/ckanext-harvest/ckanext/harvest/logic/action/update.py",
>> line 291, in harvest_jobs_run
>> jobs = harvest_job_list(context,{'source_id':source_id,'status':u'Running'})
>> File "/usr/local/ckan/pyenv/src/ckan/ckan/logic/__init__.py", line 386, in
>> wrapper
>> return action(context, data_dict)
>> File
>> "/usr/local/ckan/pyenv/src/ckanext-harvest/ckanext/harvest/logic/action/get.py",
>> line 226, in harvest_job_list
>> check_access('harvest_job_list',context,data_dict)
>> File "/usr/local/ckan/pyenv/src/ckan/ckan/logic/__init__.py", line 207, in
>> check_access
>> logic_authorization = new_authz.is_authorized(action, context, data_dict)
>> File "/usr/local/ckan/pyenv/src/ckan/ckan/new_authz.py", line 82, in
>> is_authorized
>> return auth_function(context, data_dict)
>> File
>> "/usr/local/ckan/pyenv/src/ckanext-harvest/ckanext/harvest/logic/auth/get.py",
>> line 85, in harvest_job_list
>> {'id': source_id})
>> File "/usr/local/ckan/pyenv/src/ckan/ckan/logic/__init__.py", line 207, in
>> check_access
>> logic_authorization = new_authz.is_authorized(action, context, data_dict)
>> File "/usr/local/ckan/pyenv/src/ckan/ckan/new_authz.py", line 82, in
>> is_authorized
>> return auth_function(context, data_dict)
>> File
>> "/usr/local/ckan/pyenv/src/ckanext-harvest/ckanext/harvest/logic/auth/update.py",
>> line 28, in harvest_source_update
>> raise pt.ObjectNotFound(pt._('Harvest source not found'))
>> ckan.logic.NotFound: Harvest source not found
>>
>> I changed the
>> /usr/local/ckan/pyenv/src/ckanext-harvest/ckanext/harvest/logic/auth/update.py
>> to output some data, and I get an empty user (no value)
>> and source_id: None
>>
>> Now I tried to set
>>
>> ckanext.spatial.harvest.user_name  =  admin
>>
>> but this didn't change anything.
>>
>> Then we have this empty user (name=  ) which might cause a problem, but we
>> don't know hot to get rid of it (as already described).
>>
>> paster --plugin=ckan user list -c /etc/ckan/production.ini
>> Users:
>> count = 9
>> name=
>> name=admin
>> name=annafan
>> name=joeadmin
>> name=logged_in
>> name=russianfan
>> name=tester
>> name=testsysadmin
>> name=visitor
>>
>> Any help would be appreciated.
>>
>> Otherwise is there a way to uninstall and reinstall the harvesting extension
>> ? We did an upgrade to 2.1.1 of CKAN and extensions to stable, but nothing
>> changed.
>>
>> Kind regards
>>
>> Hilde
>>
>>
>>
>> On 11/25/2013 9:05 PM, Vitor Baptista wrote:
>>
>> Hi Elena,
>>
>> By default, ckanext-harvest use the user "harvest". You'll have some poblems
>> if it doesn't exist (or you haven't changed it to another username). To
>> solve it, you can simply add this user as a sysadmin. The password doesn't
>> matter.
>>
>> Cheers,
>>
>>
>> 2013/11/25 Elena Camossi <elena.camossi at ext.jrc.ec.europa.eu
>> <mailto:elena.camossi at ext.jrc.ec.europa.eu>>
>>
>>     Hi everyone,
>>
>>     we are experiencing some problem with harvesting (CKAN 2.1, on
>>     RedHat),
>>     doing some debugging we discovered that harvesting seems to be
>>     runned by a
>>     user with no name.
>>
>>     Getting the list of users with "paster user list", we discovered
>>     that one of
>>     the users has indeed no name, and table user within Postgres has
>>     indeed a
>>     row
>>     column name having value ' '.
>>
>>     We have tried to delete this user with "paster user remove". Command
>>     succeeds but user still exists. We have tried deleting it from the
>>     Postgresql database, the row is removed but it reappears after
>>     running again
>>     "paster user list"...
>>
>>     Is there any other way to delete a CKAN user?
>>
>>     Does harvesting use some default CKAN user?
>>
>>     Thanks a lot for your help.
>>
>>     Kind regards,
>>     -Elen a
>>
>>
>>     _______________________________________________
>>     ckan-dev mailing list
>>     ckan-dev at lists.okfn.org <mailto:ckan-dev at lists.okfn.org>
>>     http://lists.okfn.org/mailman/listinfo/ckan-dev
>>     Unsubscribe: http://lists.okfn.org/mailman/options/ckan-dev
>>
>>
>>
>>
>> --
>>
>> Vítor Baptista
>>
>> Developer |http://vitorbaptista.com | LinkedIn
>> <http://www.linkedin.com/in/vitorbaptista> | @vitorbaptista
>> <http://twitter.com/vitorbaptista>
>>
>> The Open Knowledge Foundation <http://okfn.org>
>>
>> /Empowering through Open Knowledge/





More information about the ckan-dev mailing list