[ckan-dev] dataset urls

Toby Dacre toby.okfn at gmail.com
Wed Apr 25 12:19:13 UTC 2012


As a compromise that allows us both nice urls and permanent urls

*non-permanent* urls as currently

/dataset/my-nice-name

*permanent* urls have an extra piece of info that would be fixed for a
dataset

/dataset/~a4fG2
/dataset/~a4fG2/my-nice-name <- name bit optional and can be anything

using a-z A-Z 0-9 -_ we have 64 chars so 5 letters gives us 1,073,741,824
combinations which should do

This would allow not too ugly permanent urls.


1) Would these be acceptable?

2) If so, where would we use which?

Toby



On 25 April 2012 09:13, David Raznick <kindly at gmail.com> wrote:

>
>
> On Wed, Apr 25, 2012 at 1:24 AM, Rufus Pollock <rufus.pollock at okfn.org>wrote:
>
>> On 25 April 2012 01:07, David Raznick <kindly at gmail.com> wrote:
>> >
>> >
>> >> We are talking about comment links atm.
>> >>
>> >> Re name change you get broken
>> >> links on github if you rename your repo.
>> >>
>> >> We could also implement a
>> >> simple redirector by looking up in the dataset revision table for old
>> >> names :-)
>> >
>> >
>> > We considered this but it would not work unless we were sure that all
>> names
>> > are unique forever.
>>
>> But why does this need to be perfect? If someone renames and then some
>> other dataset replaces it fine - o/w this would work :-)
>>
>
>> I'm not sure we are debating same thing here. dataset uuid can be used
>> for things that absolutely need to permanent forever (e.g. rdf uris,
>> permanent identifiers for syncing). But for other stuff it's not the
>> end of the world if something breaks (if that is rare and people are
>> warned of risk)
>>
>> >>
>> >> > changing *HARD* because the last thing we want to do is confront
>> users
>> >> > with
>> >> > with more choices then necessary.  They should not be forced to think
>> >>
>> >> Why shouldn't it be like github repos. You can change but you are
>> >> warned about problems. Pick a good name.
>> >
>> >
>> > If we cared that much about the name we would not sluggify the title and
>> > force people to make good ones.  Github forces you to do this.
>>
>> We used to do this. I have pushed several times for making dataset
>> name sluggification better (remove article, warning people if long
>> ...). Github btw now has something similar to what we do.
>>
>> >> through the consequences of their actions and read some blurb as why
>> its
>> >> bad
>> >> if we can make it avoidable.
>> >
>> >> Understood. That said I frequently type in the names of familiar
>> >> datasets (but i may be unusual). That's never possible once we have
>> >> somewhat random id in there. But that's then a question of usage. I
>> >> think DataHub at least is more like GitHub (or Twitter) in that
>> >> regard: I care about this entities name a lot (compared to say
>> >> StackOverflow where I always arrive via google or similar).
>> >>
>> > I think that the relevance of the name has much less consequence to us
>> then
>> > github but more then stackoverflow.  I am happy to keep the ability to
>>
>> OK, interesting. I don't see that way so much.
>>
>> > reference by name only in the url, but not give that out when
>> systematically
>> > creating a permanent links, like in this case.
>>
>> To repeat the disqus system will reference the permanent identifier
>> and the disqus_url is, IRRC, a convenience (used from recent comments
>> etc).
>>
>> > This can include things like activity streams, social stuff, apis to
>> update
>> > qa information, feeds, and anywhere we give out urls that we expect
>> other
>> > services to use. We could use just uuids for these but some would also
>> > benefit from also being less ugly.
>>
>> There are two distinct discussions:
>>
>> * Do we want uuids exposed for datasets (by default) in the UI. I'm
>> saying no, you're saying yes :-)
>>
>
>> * Do we want uuids exposed for datasets elsewhere (e.g. in activity
>> streams, qa etc). Probable agreement ...
>>  * I'm not quite sure what this means. Internally we ref the dataset
>> object. Hence we can always change the url link at least in our system
>> as this updates. For some things like RSS feeds given out to others
>> this is more problematic (and i'd be happy with uuid/{friendly-name}
>>
>> * All agree: If we have to use uuids we can make them less ugly (e.g.
>> by appending title). I'm concerned about shortening that risks
>> collisions because you end up back where you started ...
>
>
> I think there is a misunderstanding by what was meant by collisions.  They
> should *never* happen at read time.  What I meant was when saving a dataset
> make sure the first 9 hex values of a new dataset uuid are not in the
> database already otherwise generate a new one (a collision).  This gives us
> a space of about 68 billion and shortning is down to 6 characters.
>  Collisions will happen very rarely approaching a billion datasets.
>
>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20120425/c67511fe/attachment-0001.html>


More information about the ckan-dev mailing list