[ckan-dev] Dataset vocabulary tags deleted when adding resource

JD Bothma jd at openup.org.za
Mon Nov 26 08:45:30 UTC 2018


Just to update,

Florian resolved this by using search.rebuild(package_id=res_dict['
package_id']) instead of index_for('package').update_dict(pkg_dict)

Best
JD

On Tue, 20 Nov 2018 at 12:29, JD Bothma <jd at openup.org.za> wrote:

> The pkg_dict ckanext-extractor's worker gets from package_show on
> https://github.com/stadt-karlsruhe/ckanext-extractor/blob/master/ckanext/extractor/tasks.py#L62
> already has vocabulary tags converted.
>
> So when ckanext-extractor's worker calls
> index_for('package').update_dict(pkg_dict) on
> https://github.com/stadt-karlsruhe/ckanext-extractor/blob/master/ckanext/extractor/tasks.py#L110
> there aren't any ('tag', ..., ...) keys in the data argument to the
> converters.convert_from_tags callable
> https://github.com/ckan/ckan/blob/master/ckan/logic/converters.py#L93
>
> Since converters.convert_from_tags overwrites data[key] and the worker's
> index call ends up triggering a second convert on the tag vocabulary
> fields, the worker ends up setting those fields to empty lists.
>
> I think the following are reasonable options, but I'm not sure what the
> best one is and would like help/input:
>
>    - the worker should be operating on a pre-converted pkg_dict so that
>    converting it (when indexing it) has the expected result
>       - in this case, how? Is there a context flag to package_show that
>       can give an unconverted dict?
>    - index_for('package').update_dict(pkg_dict) should handle an
>    already-converted dict safely, perhaps by having an option not to convert
>    it a second time
>       - how? Its only optional argument is defer_commit
>    - convertors should be idempotent, in which case this is a ckan bug
>       - unlikely - it sounds weird and there isn't really enough metadata
>       to support this safely, I don't think
>
>    Best
>    JD
>
> I've reopened this at
> https://github.com/stadt-karlsruhe/ckanext-extractor/issues/16
>
> On Wed, 14 Nov 2018 at 07:19, JD Bothma <jd at openup.org.za> wrote:
>
>> I've tracked this down to my vocabulary fields being empty in
>> validated_data_dict in solr after ckanext-extractor calls
>> ckan.lib.search.index_for('package').update_dict(pkg_dict) in its tasks.py
>> (as a background task) but present in data_dict and the root of the
>> document in solr.
>>
>> These tag vocabulary fields works fine generally - if I set them using
>> the API or UI, they're present in validated_data_dict and work as expected.
>> It's just after update_index gets called from ckanextextractor that they
>> disappear.
>>
>> They are present in pkg_dict before update_dict is called.
>>
>> Could it be that they fail vaidation or that my show_package_schema is
>> broken or not even called?
>> https://github.com/OpenUpSA/ckanext-satreasury/blob/master/ckanext/satreasury/plugin.py#L111
>>
>> In pkg_dict they look like  {... u'sphere': [u'national'],
>> u'financial_year': [u'2018-19'] ...}
>>
>>
>>
>> On Tue, 30 Oct 2018 at 21:26, JD Bothma <jd at openup.org.za> wrote:
>>
>>> I thought I was going crazy.
>>>
>>> I think I tracked it down to an oldish version of ckanext-extractor -
>>> I'll follow up there when I confirm -
>>> https://github.com/stadt-karlsruhe/ckanext-extractor/issues/16
>>>
>>> Still not sure why it's happening in production but not locally.
>>>
>>> I think this has been hidden in the past because I used a script that
>>> would update (and fix) the package each time I add a resource, and I
>>> generally add XLS resources after adding PDF resources to the same
>>> datasets, and I have extractor configured to only extract PDF resources.
>>>
>>> Best
>>> JD
>>>
>>> On Tue, 30 Oct 2018 at 20:52, JD Bothma <jd at openup.org.za> wrote:
>>>
>>>> I have a situation where vocabulary tags get removed from a dataset
>>>> when I add resources with resource_create.
>>>>
>>>> This only happens in production - I can't reproduce it locally.
>>>>
>>>> I can't tell what's different from how I used to create datasets and
>>>> add resources - this worked fine but recently this has shown up.
>>>>
>>>> It happens regardless of whether I create the resource immediately
>>>> after creating the dataset, or a few minutes after creating the dataset.
>>>>
>>>> Has anyone seen anything like this?
>>>>
>>>> --
>>>> JD Bothma
>>>> Software Developer
>>>> OpenUp
>>>> +27 (0)79 281 6737
>>>> +27 (0)21 671 6306
>>>>
>>>
>>>
>>> --
>>> JD Bothma
>>> Software Developer
>>> OpenUp
>>> +27 (0)79 281 6737
>>> +27 (0)21 671 6306
>>>
>>
>>
>> --
>> JD Bothma
>> Software Developer
>> OpenUp
>> +27 (0)79 281 6737
>> +27 (0)21 671 6306
>>
>
>
> --
> JD Bothma
> Software Developer
> OpenUp
> +27 (0)79 281 6737
> +27 (0)21 671 6306
>


-- 
JD Bothma
Software Developer
OpenUp
+27 (0)79 281 6737
+27 (0)21 671 6306
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20181126/e0307c7c/attachment-0002.html>


More information about the ckan-dev mailing list