[ckan-dev] potential lost resource in case of 2 simultaneous requests ?
Adrià Mercader
adria.mercader at okfn.org
Tue Jul 4 08:26:40 UTC 2017
Hi Alex,
Thanks for your thorough investigations. I think that this is something
worth discussing in the dev meeting. Would you mind creating an issue on
the main repo with this information so we can discuss there?
Thanks a lot.
Adrià
On 2 July 2017 at 01:27, Alex Gartner <alexandru.gartner+ckan at gmail.com>
wrote:
> I've changed a bit package_show() and the get() function from Package
> model to use "with_for_update()
> <http://docs.sqlalchemy.org/en/latest/orm/query.html#sqlalchemy.orm.query.Query.with_for_update>"
> from sqlalchemy. The idea was to use "*select for update*" when
> retrieving the package so that another request/transaction can't modify
> them in the meantime. Tested locally and it seemed to work fine: prevented
> the resource from disappearing as in the scenario from my initial email.
>
> *package_show()* - added support for a 'for_update' flag in context. This
> flag was set when resource_create() / resource_update() started.:
> *for_update = context.get('for_update', False)*
> *pkg = model.Package.get(name_or_id, for_update)*
>
>
> *get() in Package model*
>
>
>
>
>
>
>
>
>
>
>
>
> *@classmethoddef get(cls, reference, for_update=False): '''Returns a package object referenced by its id or name.''' query = meta.Session.query(cls).filter(cls.id==reference) if for_update: query = query.with_for_update() pkg = query.first() if pkg == None: pkg = cls.by_name(reference) return pkg*
>
>
>
>
> On Sat, Jul 1, 2017 at 1:59 AM, Alex Gartner <
> alexandru.gartner+ckan at gmail.com> wrote:
>
>> Hello,
>>
>> I'm wondering if the fact that resource_create() ( and similarly
>> resource_update() ) does a package_show()
>> <https://github.com/OCHA-DAP/hdx-ckan/blob/dev/ckan/logic/action/create.py#L285>
>> followed later by a package_update()
>> <https://github.com/OCHA-DAP/hdx-ckan/blob/dev/ckan/logic/action/create.py#L302>
>> can potentially lead to a lost resource in some special cases.
>>
>> Since postgres uses by default "read committed" transaction isolation I
>> think the following could happen:
>> 2 almost simultaneous requests (R1, R2) come to the API and are dealt
>> with by different processes/threads. Both are modifying dataset *D which
>> already has one resource (resource1)*
>>
>> TIMELINE
>>
>> 1. (R1) starts *resource_create*(resource2) on dataset D
>> 2. (R2) starts *resource_update*(resource1) on dataset D
>> 3. (R2) does package_show(D) => *gets D with just resource1*
>> 4. (R2) changes resource1 in D
>> 5. (R1) does package_show(D) => gets D with resource1
>> 6. (R1) adds resource2 to D => D.resources = [resource1, resource2]
>> 7. (R1) does package_update(D)
>> 8. (R1) resource_create() finishes, everything is committed
>> successfully to the db => D has 2 resources in the db
>> 9. (R2) does package_update(D) - please note that here D only has
>> one resource as read in step 3
>> 10. (R2) resource_update() finishes, everything is committed
>> successfully to the db => D has just resource1 (resource2 disappears)
>>
>> Question: is this something that seems possible ? I reproduced this
>> locally on a slightly modified CKAN running paster but that could also mean
>> that I have something misconfigured or changed. Before starting to think
>> about strategies for avoiding this scenario (like a different transaction
>> isolation) is there some mechanism in CKAN that would prevent this ? Did
>> anyone stumble onto such an issue ?
>>
>> Thank you,
>> Alex Gartner
>>
>>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20170704/78dcbb9c/attachment-0003.html>
More information about the ckan-dev
mailing list