[ECODP-dev] links of the mime type
David Raznick
david.raznick at okfn.org
Fri Aug 23 11:52:44 UTC 2013
Hello
Darwin is away today so will answer on his behalf
On 22 August 2013 21:23, Bert Van Nuffelen
<bert.van.nuffelen at tenforce.com>wrote:
> Hi Leda,
>
> thanks for reviewing this. I've put Darwin in cc so that he can follow
> this track.
>
> Lets start with some context.
>
> This json is the list of accepted values for the format and mime-type
> fields in CKAN.
>
> Each entry in the json is of the form: <1> : [ <2>, <3>, <4> ] where
> <1> = the value entered by the publisher
> <2> = the unifying key (*)
> <3> = the ( short ) label
> <4> = the ( long ) label for description/tooltip purpose
>
> (*) @Darwin, I am not sure of this.
>
Yes <2> is the value that we want to be the unifying key what we want to be
stored in ckan.
>
> So to answer the second question: yes there is a mixture as it is the
> union serving 2 fields.
> In addition, this 'mixture' is enhanced by the standard CKAN approach to
> support the capturing of typical human ways of communicating data formats.
> For example, we communicate amongst ourselves about html pages, so a human
> publisher using the web interface will enter typically 'html' instead of a
> standard technical notation 'text/html'.
> In a machine automated process such relaxation is typically not done.
>
> W.r.t. relaxation:
>
> The currently be used formats in the EU ODP can be retrieved with this
> SPARQL query:
> select distinct ?o where {?s <
> http://ec.europa.eu/open-data/ontologies/ec-odp#distributionFormat> ?o}
> limit 200
>
> As you can see about 70-80% uses a technical mime-type representation.
> (But even that is not a guarantee for having one unique representation for
> the same data format: application/rdf+xml and rdf/xml)
> The other use variations of the human format denotation.
>
> From a process point of view: -- @Darwin, correct me if I am wrong here --
> If the json would be limited to exactly one line for the html page case:
> "text/html": ["text/html", "HTML", "Web Page"],
>
> only the value 'text/html' would be accepted in the CKAN database.
>
> At the user interface level nothing would change. @Darwin, this is correct
> I believe?
>
> @Darwin, which part of the json will be used for the RDF format
> representation: <1> or <2>?
>
We want the publishers to store <2>. The only reason that there are
repeated values is that there are many historical formats that publishers
have added that need to be mapped to the correct form. That mapping file
is based on looking at the database and trying to map what was actually
input historically. In a perfect world all that would be in column <1>
would be values in column <2> but in the data currently that is not the
case.
We have also added another file:
https://github.com/okfn/ckanext-ecportal/blob/next/data/resource_dropdown.json
This is what will end up in the front end form for publishers to select.
This file is much cleaner and maps the value in <2> to a human readable
format.
Thanks
David
>
> I hope this clarifies a bit the information.
>
> best regards,
>
> Bert
>
> ps. I am out of office until mondag evening, so further follow-up from my
> side on this will be done from tuesday on.
>
>
> 2013/8/22 BARGIOTTI Leda (OP) <Leda.BARGIOTTI at publications.europa.eu>
>
>> Hi Bert,****
>>
>> ** **
>>
>> Could you please explain this list?****
>>
>> ** **
>>
>> **1. **More specifically, could you please tell us what each
>> element means? For example:****
>>
>> ** **
>>
>> "text/html": ["text/html", "HTML", "Web Page"]****
>>
>> ** **
>>
>> **· **"text/html":: ?****
>>
>> **· **"text/html":?****
>>
>> **· **"HTML":?****
>>
>> **· **"Web Page":?****
>>
>> ** **
>>
>> **2. **Secondly, it seems that this list is a mix of formats and
>> other things such as "application/sparql-query"****
>>
>> ** **
>>
>> **3. **Thirdly, how shall we interpret elements that are listed
>> more than once? e.g.:****
>>
>> "text/html": ["text/html", "HTML", "Web Page"],****
>>
>> "htm": ["text/html", "HTML", "Web Page"],****
>>
>> "html": ["text/html", "HTML", "Web Page"],****
>>
>> "http://purl.org/net/mediatypes/text/html": ["text/html", "HTML", "Web
>> Page"],****
>>
>> ** **
>>
>> **1. **Fourthly: where does this list come from exactly? Is it an
>> sprql query of the ODP?****
>>
>> ** **
>>
>> If our objective is to have a list of file types to be used both in RDF
>> and as a drop down list in the UI, we need to clearly understand which one
>> is the code and which one is the label and if the same file type is listed
>> more than once. Maybe you discussed this already with the other members of
>> the team, but I would really appreciate if you could help me to shed some
>> light into this, otherwise I will not be able to come up with a good list.
>> ****
>>
>> ** **
>>
>> Thank you in advance****
>>
>> ** **
>>
>> Kind regards****
>>
>>
>> Leda****
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> *From:* Bert Van Nuffelen [mailto:bert.van.nuffelen at tenforce.com]
>> *Sent:* Thursday, August 22, 2013 4:29 PM
>> *To:* PASTOR CAMARASA José Juan (OP); BARGIOTTI Leda (OP)
>> *Cc:* ISOARD Olivier (OP)
>> *Subject:* links of the mime type****
>>
>> ** **
>>
>> Hi José,****
>>
>> the list of mime-types you find online at
>> https://github.com/okfn/ckanext-ecportal/blob/resource_formats/data/resource_mapping.json
>> .****
>>
>> I added a downloaded version to this mail.****
>>
>> Bert
>> ****
>>
>>
>> --
>> Bert Van Nuffelen
>>
>> Semantic Technologies Software Architect at TenForce
>> www.tenforce.be
>>
>> Bert.Van.Nuffelen at tenforce.com
>> Office: +32 (0)16 31 48 60
>> Mobile:+32 479 06 24 26
>> skype: bert.van.nuffelen ****
>>
>
>
>
> --
> Bert Van Nuffelen
>
> Semantic Technologies Software Architect at TenForce
> www.tenforce.be
>
> Bert.Van.Nuffelen at tenforce.com
> Office: +32 (0)16 31 48 60
> Mobile:+32 479 06 24 26
> skype: bert.van.nuffelen
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.okfn.org/mailman/private/ecodp-dev/attachments/20130823/f5534c3b/attachment.html>
More information about the ecodp-dev
mailing list