[ckan-committers] Security Issue: Datastore SQL search gives access to system tables
Adrià Mercader
adria.mercader at okfn.org
Tue Jul 29 12:25:32 UTC 2014
Hi all,
I'm sending this to this list as this is a potential security issue
relating to the datastore.
If we later decide it's fine to share it on the list I'll forward it.
Basically our implementation of `datastore_search_sql` allows users to
query the internal PostgreSQL tables.
If the permissions are properly set (which unfortunately it's not a
given, we need to improve that) the user running the queries still
wouldn't be able to modify them, but still it is a pretty bad
information leakage, as you can get eg
* List of roles
http://demo.ckan.org/api/action/datastore_search_sql?sql=select%20*%20from%20pg_catalog.pg_roles
* List of tables
http://demo.ckan.org/api/action/datastore_search_sql?sql=select%20*%20from%20pg_catalog.pg_tables
etc
I would really like this to be included in the patch releases for 2.0
to 2.2, as well as master so I took a look at some available options:
1. Use sqlparse and parse the incoming SQL for system tables
https://github.com/andialbrecht/sqlparse/blob/master/examples/extract_table_names.py
Pros: Easy to integrate (on master, backporting would be another issue)
Cons: - Not sure how well maintained
- Some statements go through (eg select
"2e12d91b-63aa-4919-8d3d-db3b489d4b14"."LAT", (select rolname from
pg_catalog.pg_roles LIMIT 1) FROM
"2e12d91b-63aa-4919-8d3d-db3b489d4b14")
2. Look how others do it: use CartoDB SQL scripts:
- https://github.com/CartoDB/cartodb-postgresql/blob/master/scripts-available/CDB_QueryTables.sql
- https://github.com/CartoDB/cartodb-postgresql/blob/master/scripts-available/CDB_QueryStatements.sql
SELECT CDB_QueryTables(<original query>)
Pros: Tested and maintained
Cons: - Hard to install
- Requires plpythonu
- Only works on PG 9.x
- License?
3. Call EXPLAIN (FORMAT JSON) <statement> from the datastore code and
parse the results for getting the tables before running the actual
command
eg: for select "2e12d91b-63aa-4919-8d3d-db3b489d4b14"."LAT", (select
rolname from pg_catalog.pg_roles LIMIT 1) FROM
"2e12d91b-63aa-4919-8d3d-db3b489d4b14"
[{u'Plan': {u'Alias': u'2e12d91b-63aa-4919-8d3d-db3b489d4b14',
u'Node Type': u'Seq Scan',
u'Plan Rows': 1775,
u'Plan Width': 8,
u'Plans': [{u'Node Type': u'Limit',
u'Parent Relationship': u'InitPlan',
u'Plan Rows': 1,
u'Plan Width': 68,
u'Plans': [{u'Alias': u'pg_authid',
u'Node Type': u'Seq Scan',
u'Parent Relationship': u'Outer',
u'Plan Rows': 1,
u'Plan Width': 68,
u'Relation Name': u'pg_authid',
u'Startup Cost': 0.0,
u'Total Cost': 1.01}],
u'Startup Cost': 0.0,
u'Subplan Name': u'InitPlan 1 (returns $0)',
u'Total Cost': 1.01}],
u'Relation Name': u'2e12d91b-63aa-4919-8d3d-db3b489d4b14',
u'Startup Cost': 1.01,
u'Total Cost': 136.76}}]
Pros: - No dependencies
- Easy to backport
Cons: - Not sure how easy it is to get the tables (need to look into it)
- Differences between PG 9.x and 8.x?
- Performance issues?
I personally think that our best option is 3. Of course I need to
investigate EXPLAIN a bit more but seems like iter the dict looking
for "Relation Name" might be a good approach. Let me know if you think
otherwise or you can think of a better option.
Any thoughts welcome
Adrià
More information about the ckan-committers
mailing list