[ckan-dev] dataset urls

Wed Apr 25 13:51:57 UTC 2012

On Wed, Apr 25, 2012 at 1:19 PM, Toby Dacre <toby.okfn at gmail.com> wrote:

> As a compromise that allows us both nice urls and permanent urls
>
> *non-permanent* urls as currently
>
> /dataset/my-nice-name
>
> *permanent* urls have an extra piece of info that would be fixed for a
> dataset
>
> /dataset/~a4fG2
> /dataset/~a4fG2/my-nice-name <- name bit optional and can be anything
>
> using a-z A-Z 0-9 -_ we have 64 chars so 5 letters gives us 1,073,741,824
> combinations which should do
>
> This would allow not too ugly permanent urls.
>
>
> 1) Would these be acceptable?
>

+1

>
> 2) If so, where would we use which?
>

For the moment anywhere where we give out our url to a third party service
including discus and the social stuff.   We can make the default in the url
bar be the non permanent one, but would be nice when adding a bookmark to
somehow force in the perminant one.

> Toby
>
>
>
> On 25 April 2012 09:13, David Raznick <kindly at gmail.com> wrote:
>
>>
>>
>> On Wed, Apr 25, 2012 at 1:24 AM, Rufus Pollock <rufus.pollock at okfn.org>wrote:
>>
>>> On 25 April 2012 01:07, David Raznick <kindly at gmail.com> wrote:
>>> >
>>> >
>>> >> We are talking about comment links atm.
>>> >>
>>> >> Re name change you get broken
>>> >> links on github if you rename your repo.
>>> >>
>>> >> We could also implement a
>>> >> simple redirector by looking up in the dataset revision table for old
>>> >> names :-)
>>> >
>>> >
>>> > We considered this but it would not work unless we were sure that all
>>> names
>>> > are unique forever.
>>>
>>> But why does this need to be perfect? If someone renames and then some
>>> other dataset replaces it fine - o/w this would work :-)
>>>
>>
>>> I'm not sure we are debating same thing here. dataset uuid can be used
>>> for things that absolutely need to permanent forever (e.g. rdf uris,
>>> permanent identifiers for syncing). But for other stuff it's not the
>>> end of the world if something breaks (if that is rare and people are
>>> warned of risk)
>>>
>>> >>
>>> >> > changing *HARD* because the last thing we want to do is confront
>>> users
>>> >> > with
>>> >> > with more choices then necessary.  They should not be forced to
>>> think
>>> >>
>>> >> Why shouldn't it be like github repos. You can change but you are
>>> >> warned about problems. Pick a good name.
>>> >
>>> >
>>> > If we cared that much about the name we would not sluggify the title
>>> and
>>> > force people to make good ones.  Github forces you to do this.
>>>
>>> We used to do this. I have pushed several times for making dataset
>>> name sluggification better (remove article, warning people if long
>>> ...). Github btw now has something similar to what we do.
>>>
>>> >> through the consequences of their actions and read some blurb as why
>>> its
>>> >> bad
>>> >> if we can make it avoidable.
>>> >
>>> >> Understood. That said I frequently type in the names of familiar
>>> >> datasets (but i may be unusual). That's never possible once we have
>>> >> somewhat random id in there. But that's then a question of usage. I
>>> >> think DataHub at least is more like GitHub (or Twitter) in that
>>> >> regard: I care about this entities name a lot (compared to say
>>> >> StackOverflow where I always arrive via google or similar).
>>> >>
>>> > I think that the relevance of the name has much less consequence to us
>>> then
>>> > github but more then stackoverflow.  I am happy to keep the ability to
>>>
>>> OK, interesting. I don't see that way so much.
>>>
>>> > reference by name only in the url, but not give that out when
>>> systematically
>>> > creating a permanent links, like in this case.
>>>
>>> To repeat the disqus system will reference the permanent identifier
>>> and the disqus_url is, IRRC, a convenience (used from recent comments
>>> etc).
>>>
>>> > This can include things like activity streams, social stuff, apis to
>>> update
>>> > qa information, feeds, and anywhere we give out urls that we expect
>>> other
>>> > services to use. We could use just uuids for these but some would also
>>> > benefit from also being less ugly.
>>>
>>> There are two distinct discussions:
>>>
>>> * Do we want uuids exposed for datasets (by default) in the UI. I'm
>>> saying no, you're saying yes :-)
>>>
>>
>>> * Do we want uuids exposed for datasets elsewhere (e.g. in activity
>>> streams, qa etc). Probable agreement ...
>>>  * I'm not quite sure what this means. Internally we ref the dataset
>>> object. Hence we can always change the url link at least in our system
>>> as this updates. For some things like RSS feeds given out to others
>>> this is more problematic (and i'd be happy with uuid/{friendly-name}
>>>
>>> * All agree: If we have to use uuids we can make them less ugly (e.g.
>>> by appending title). I'm concerned about shortening that risks
>>> collisions because you end up back where you started ...
>>
>>
>> I think there is a misunderstanding by what was meant by collisions.
>>  They should *never* happen at read time.  What I meant was when saving a
>> dataset make sure the first 9 hex values of a new dataset uuid are not in
>> the database already otherwise generate a new one (a collision).  This
>> gives us a space of about 68 billion and shortning is down to 6 characters.
>>  Collisions will happen very rarely approaching a billion datasets.
>>
>>
>>
>> _______________________________________________
>> ckan-dev mailing list
>> ckan-dev at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/ckan-dev
>>
>>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20120425/29b8efc2/attachment-0001.html>