[ckan-discuss] RDBMS Data Source

Haq, Salman Salman.Haq at neustar.biz
Wed Mar 21 22:14:28 GMT 2012



On 3/19/12 5:07 AM, "Rufus Pollock" <rufus.pollock at okfn.org> wrote:

>On 19 March 2012 00:47, Haq, Salman <Salman.Haq at neustar.biz> wrote:
>> Hi,
>>
>> I hope this is the right forum for this question:
>>
>> Currently CKAN supports three types of data sources: CSV/Excel file
>>upload,
>> hyperlink to API, hyperlink to CSV/Excel file.
>>
>> Is there a plugin that supports a relational database as fourth type of
>>data
>> source?
>
>No, but it's a really interesting idea. You'd be wanting to add to
>this extension:
>
><https://github.com/okfn/ckanext-datastorer>
>
>Specifically (I think):
>
>https://github.com/okfn/ckanext-datastorer/blob/master/ckanext/datastorer/
>tasks.py#L86
>
>@David (Raznick): any thoughts on this?
>
>> Such a plugin could be provided a username/password combination to
>>connect
>> to the server and extract metadata information automatically. Pyodbc or
>> SQLAlchemy could be used for implementing this.
>
>Indeed. The issue here is how to provide the username / password info ...
>
>Rufus

Rufus,

Thanks for your response. As a follow-up, I have a few other ideas that
require feedback.

1. Support for (partial) SQL dumps.
This is in the same vein as connecting directly to a database. Except, as
you pointed out, login information is usually guarded. But a SQL dump file
can be parsed to learn the schema. We can limit ourselves to 'create
table' and 'create view' statements as they are sufficient to understand
the schema.

2. 'Schema' preview for resources.
Given that we can preview a resource in 'Grid' and 'Graph' form, what
about an additional view that lists only the schema of the resource that
is derived from the resource? RDF will be helpful here although it's not
clear to me if there are well known ontologies to describe spreadsheets
and relational databases. If satisfactory ontologies do not exist, perhaps
they can be collaboratively developed by the OKFN community?

3. Semantic types for attributes in the dataset.
Provide a way for data custodians to associate a higher-level type with
columns in a spreadsheet or relational table. Example of a semantic type
can be email address, street address, telephone number, latitude, etc.
This can be a part of the 'Schema' preview.

Thanks,
Shaq





More information about the ckan-discuss mailing list