[ckan-committers] Security Issue: Datastore SQL search gives access to system tables

Tue Jul 29 12:25:32 UTC 2014

Hi all,

I'm sending this to this list as this is a potential security issue
relating to the datastore.
If we later decide it's fine to share it on the list I'll forward it.

Basically our implementation of `datastore_search_sql` allows users to
query the internal PostgreSQL tables.

If the permissions are properly set (which unfortunately it's not a
given, we need to improve that) the user running the queries still
wouldn't be able to modify them, but still it is a pretty bad
information leakage, as you can get eg

* List of roles
http://demo.ckan.org/api/action/datastore_search_sql?sql=select%20*%20from%20pg_catalog.pg_roles

* List of tables
http://demo.ckan.org/api/action/datastore_search_sql?sql=select%20*%20from%20pg_catalog.pg_tables

etc

I would really like this to be included in the patch releases for 2.0
to 2.2, as well as master so I took a look at some available options:

1. Use sqlparse and parse the incoming SQL for system tables

    https://github.com/andialbrecht/sqlparse/blob/master/examples/extract_table_names.py

    Pros: Easy to integrate (on master, backporting would be another issue)

    Cons: - Not sure how well maintained
          - Some statements go through (eg select
"2e12d91b-63aa-4919-8d3d-db3b489d4b14"."LAT",  (select rolname from
pg_catalog.pg_roles LIMIT 1) FROM
"2e12d91b-63aa-4919-8d3d-db3b489d4b14")

2. Look how others do it: use CartoDB SQL scripts:
    - https://github.com/CartoDB/cartodb-postgresql/blob/master/scripts-available/CDB_QueryTables.sql
    - https://github.com/CartoDB/cartodb-postgresql/blob/master/scripts-available/CDB_QueryStatements.sql

    SELECT CDB_QueryTables(<original query>)

    Pros: Tested and maintained

    Cons: - Hard to install
          - Requires plpythonu
          - Only works on PG 9.x
          - License?

3. Call EXPLAIN (FORMAT JSON) <statement> from the datastore code and
parse the results for getting the tables before running the actual
command

eg: for select "2e12d91b-63aa-4919-8d3d-db3b489d4b14"."LAT",  (select
rolname from pg_catalog.pg_roles LIMIT 1) FROM
"2e12d91b-63aa-4919-8d3d-db3b489d4b14"

    [{u'Plan': {u'Alias': u'2e12d91b-63aa-4919-8d3d-db3b489d4b14',
            u'Node Type': u'Seq Scan',
            u'Plan Rows': 1775,
            u'Plan Width': 8,
            u'Plans': [{u'Node Type': u'Limit',
                        u'Parent Relationship': u'InitPlan',
                        u'Plan Rows': 1,
                        u'Plan Width': 68,
                        u'Plans': [{u'Alias': u'pg_authid',
                                    u'Node Type': u'Seq Scan',
                                    u'Parent Relationship': u'Outer',
                                    u'Plan Rows': 1,
                                    u'Plan Width': 68,
                                    u'Relation Name': u'pg_authid',
                                    u'Startup Cost': 0.0,
                                    u'Total Cost': 1.01}],
                        u'Startup Cost': 0.0,
                        u'Subplan Name': u'InitPlan 1 (returns $0)',
                        u'Total Cost': 1.01}],
            u'Relation Name': u'2e12d91b-63aa-4919-8d3d-db3b489d4b14',
            u'Startup Cost': 1.01,
            u'Total Cost': 136.76}}]

    Pros: - No dependencies
          - Easy to backport

    Cons: - Not sure how easy it is to get the tables (need to look into it)
          - Differences between PG 9.x and 8.x?
          - Performance issues?

I personally think that our best option is 3. Of course I need to
investigate EXPLAIN a bit more but seems like iter the dict looking
for "Relation Name" might be a good approach. Let me know if you think
otherwise or you can think of a better option.

Any thoughts welcome

Adrià