[ckan-dev] potential lost resource in case of 2 simultaneous requests ?

Alex Gartner alexandru.gartner+ckan at gmail.com
Sun Jul 2 00:27:55 UTC 2017


I've changed a bit package_show() and the get() function from Package model
to use "with_for_update()
<http://docs.sqlalchemy.org/en/latest/orm/query.html#sqlalchemy.orm.query.Query.with_for_update>"
from sqlalchemy. The idea was to use "*select for update*" when retrieving
the package so that another request/transaction can't modify them in the
meantime. Tested locally and it seemed to work fine: prevented the resource
from disappearing as in the scenario from my initial email.

*package_show()* - added support for a 'for_update' flag in context. This
flag was set when resource_create() / resource_update() started.:
*for_update = context.get('for_update', False)*
*pkg = model.Package.get(name_or_id, for_update)*


*get() in Package model*












*@classmethoddef get(cls, reference, for_update=False):    '''Returns
a package object referenced by its id or name.'''    query =
meta.Session.query(cls).filter(cls.id==reference)    if for_update:
    query = query.with_for_update()    pkg = query.first()    if pkg
== None:        pkg = cls.by_name(reference)    return pkg*




On Sat, Jul 1, 2017 at 1:59 AM, Alex Gartner <
alexandru.gartner+ckan at gmail.com> wrote:

> Hello,
>
> I'm wondering if the fact that resource_create() ( and similarly
> resource_update() ) does a package_show()
> <https://github.com/OCHA-DAP/hdx-ckan/blob/dev/ckan/logic/action/create.py#L285>
> followed later by a package_update()
> <https://github.com/OCHA-DAP/hdx-ckan/blob/dev/ckan/logic/action/create.py#L302>
> can potentially lead to a lost resource in some special cases.
>
> Since postgres uses by default "read committed" transaction isolation I
> think the following could happen:
> 2 almost simultaneous requests (R1, R2) come to the API and are dealt
> with by different processes/threads. Both are modifying dataset *D which
> already has one resource (resource1)*
>
> TIMELINE
>
>    1.  (R1) starts *resource_create*(resource2) on dataset D
>    2.  (R2) starts *resource_update*(resource1) on dataset D
>    3.  (R2) does package_show(D) => *gets D with just resource1*
>    4.  (R2) changes resource1 in D
>    5.  (R1) does package_show(D) => gets D with resource1
>    6.  (R1) adds resource2 to D => D.resources = [resource1, resource2]
>    7.  (R1) does package_update(D)
>    8.  (R1) resource_create() finishes, everything is committed
>    successfully to the db => D has 2 resources in the db
>    9.  (R2) does package_update(D) - please note that here D only has one
>    resource as read in step 3
>    10.  (R2) resource_update() finishes, everything is committed
>    successfully to the db => D has just resource1 (resource2 disappears)
>
> Question: is this something that seems possible ? I reproduced this
> locally on a slightly modified CKAN running paster but that could also mean
> that I have something misconfigured or changed. Before starting to think
> about strategies for avoiding this scenario (like a different transaction
> isolation) is there some mechanism in CKAN that would prevent this ? Did
> anyone stumble onto such an issue ?
>
> Thank you,
> Alex Gartner
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20170702/5d25d358/attachment-0002.html>


More information about the ckan-dev mailing list