[ckan-dev] Dataset vocabulary tags deleted when adding resource

JD Bothma jd at openup.org.za
Tue Nov 20 10:29:44 UTC 2018


The pkg_dict ckanext-extractor's worker gets from package_show on
https://github.com/stadt-karlsruhe/ckanext-extractor/blob/master/ckanext/extractor/tasks.py#L62
already has vocabulary tags converted.

So when ckanext-extractor's worker calls
index_for('package').update_dict(pkg_dict) on
https://github.com/stadt-karlsruhe/ckanext-extractor/blob/master/ckanext/extractor/tasks.py#L110
there aren't any ('tag', ..., ...) keys in the data argument to the
converters.convert_from_tags callable
https://github.com/ckan/ckan/blob/master/ckan/logic/converters.py#L93

Since converters.convert_from_tags overwrites data[key] and the worker's
index call ends up triggering a second convert on the tag vocabulary
fields, the worker ends up setting those fields to empty lists.

I think the following are reasonable options, but I'm not sure what the
best one is and would like help/input:

   - the worker should be operating on a pre-converted pkg_dict so that
   converting it (when indexing it) has the expected result
      - in this case, how? Is there a context flag to package_show that can
      give an unconverted dict?
   - index_for('package').update_dict(pkg_dict) should handle an
   already-converted dict safely, perhaps by having an option not to convert
   it a second time
      - how? Its only optional argument is defer_commit
   - convertors should be idempotent, in which case this is a ckan bug
      - unlikely - it sounds weird and there isn't really enough metadata
      to support this safely, I don't think

   Best
   JD

I've reopened this at
https://github.com/stadt-karlsruhe/ckanext-extractor/issues/16

On Wed, 14 Nov 2018 at 07:19, JD Bothma <jd at openup.org.za> wrote:

> I've tracked this down to my vocabulary fields being empty in
> validated_data_dict in solr after ckanext-extractor calls
> ckan.lib.search.index_for('package').update_dict(pkg_dict) in its tasks.py
> (as a background task) but present in data_dict and the root of the
> document in solr.
>
> These tag vocabulary fields works fine generally - if I set them using the
> API or UI, they're present in validated_data_dict and work as expected.
> It's just after update_index gets called from ckanextextractor that they
> disappear.
>
> They are present in pkg_dict before update_dict is called.
>
> Could it be that they fail vaidation or that my show_package_schema is
> broken or not even called?
> https://github.com/OpenUpSA/ckanext-satreasury/blob/master/ckanext/satreasury/plugin.py#L111
>
> In pkg_dict they look like  {... u'sphere': [u'national'],
> u'financial_year': [u'2018-19'] ...}
>
>
>
> On Tue, 30 Oct 2018 at 21:26, JD Bothma <jd at openup.org.za> wrote:
>
>> I thought I was going crazy.
>>
>> I think I tracked it down to an oldish version of ckanext-extractor -
>> I'll follow up there when I confirm -
>> https://github.com/stadt-karlsruhe/ckanext-extractor/issues/16
>>
>> Still not sure why it's happening in production but not locally.
>>
>> I think this has been hidden in the past because I used a script that
>> would update (and fix) the package each time I add a resource, and I
>> generally add XLS resources after adding PDF resources to the same
>> datasets, and I have extractor configured to only extract PDF resources.
>>
>> Best
>> JD
>>
>> On Tue, 30 Oct 2018 at 20:52, JD Bothma <jd at openup.org.za> wrote:
>>
>>> I have a situation where vocabulary tags get removed from a dataset when
>>> I add resources with resource_create.
>>>
>>> This only happens in production - I can't reproduce it locally.
>>>
>>> I can't tell what's different from how I used to create datasets and add
>>> resources - this worked fine but recently this has shown up.
>>>
>>> It happens regardless of whether I create the resource immediately after
>>> creating the dataset, or a few minutes after creating the dataset.
>>>
>>> Has anyone seen anything like this?
>>>
>>> --
>>> JD Bothma
>>> Software Developer
>>> OpenUp
>>> +27 (0)79 281 6737
>>> +27 (0)21 671 6306
>>>
>>
>>
>> --
>> JD Bothma
>> Software Developer
>> OpenUp
>> +27 (0)79 281 6737
>> +27 (0)21 671 6306
>>
>
>
> --
> JD Bothma
> Software Developer
> OpenUp
> +27 (0)79 281 6737
> +27 (0)21 671 6306
>


-- 
JD Bothma
Software Developer
OpenUp
+27 (0)79 281 6737
+27 (0)21 671 6306
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20181120/8753cca1/attachment-0002.html>


More information about the ckan-dev mailing list