[ckan-dev] CKAN showing up in Google searches - Language used

Adrià Mercader adria.mercader at okfn.org
Tue Feb 3 09:46:58 UTC 2015


Hi,

The following snippet should do the trick for adding a rel="canonical"
link into the dataset page <head> section and should be a good
starting point either for adding it to your own extension templates or
for submitting a PR if someone feels like it (in which case it will
probably be a good idea to extend it to organizations, groups and
other relevant pages):



diff --git a/ckan/templates/package/read_base.html
b/ckan/templates/package/read_base.html
index 4fc1039..022fdfc 100644
--- a/ckan/templates/package/read_base.html
+++ b/ckan/templates/package/read_base.html
@@ -5,6 +5,8 @@
 {% block links -%}
   {{ super() }}
   <link rel="alternate" type="application/rdf+xml" href="{{
h.url_for(controller='package', action='read', id=pkg.id,
format='rdf', qualified=True) }}"/>
+
+  <link rel="canonical" href="{{ h.url_for(controller='package',
action='read', id=pkg.name, qualified=True, locale='default') }}"/>
 {% endblock -%}

 {% block head_extras -%}


Hope this helps,

Adrià


On 3 February 2015 at 07:08, Stefan Oderbolz <stefan.oderbolz at liip.ch> wrote:
> Btw, there is a very good Google Webmaster article describing how this
> can be done: http://googlewebmastercentral.blogspot.ch/2010/09/unifying-content-under-multilingual.html
>
> On Mon, Feb 2, 2015 at 11:40 PM, Stefan Oderbolz
> <stefan.oderbolz at liip.ch> wrote:
>> Hi there,
>>
>> while you could use the robozs.txt "trick" to exclude URLs, its not a very
>> nice practice, because then the content is basically lost for Googel et al.
>>
>> The prefered way to handle this is by using canonical links (see
>> https://support.google.com/webmasters/answer/139066?hl=en). This is a way to
>> tell a search engine your prefered URL for content and even specify that
>> each page exists in several languages.
>>
>> Afaik this has not (yet) been implemented in CKAN. It would definitely be a
>> very welcome patch or extension.
>>
>> Regards Stefan
>>
>> On Feb 2, 2015 3:17 AM, "Alex (Maxious) Sadleir" <maxious at gmail.com> wrote:
>>>
>>> You can patch robots.txt in the CKAN source to exclude languages from
>>> search results (while users can still switch on the site unless you
>>> also disable languages in the config). I also exclude /_tracking
>>>
>>> diff --git a/ckan/public/robots.txt b/ckan/public/robots.txt
>>> index 279a33a..e410bdc 100644
>>> --- a/ckan/public/robots.txt
>>> +++ b/ckan/public/robots.txt
>>> @@ -3,6 +3,50 @@ Disallow: /dataset/rate/
>>>  Disallow: /revision/
>>>  Disallow: /dataset/*/history
>>>  Disallow: /api/
>>> +Disallow: /_tracking
>>> +Disallow: /_tracking
>>> +
>>> +Disallow: /ar/
>>> +Disallow: /bg/
>>> +Disallow: /ca
>>> +Disallow: /cs_CZ/
>>> +Disallow: /da_DK/
>>> +Disallow: /de/
>>> +Disallow: /dv/
>>> +Disallow: /el/
>>> +Disallow: /en_AU/
>>> +Disallow: /en_GB/
>>> +Disallow: /es/
>>> +Disallow: /es_AR/
>>> +Disallow: /fa_IR/
>>> +Disallow: /fi/
>>> +Disallow: /fr/
>>> +Disallow: /hu/
>>> +Disallow: /id/
>>> +Disallow: /is/
>>> +Disallow: /it/
>>> +Disallow: /ja/
>>> +Disallow: /km/
>>> +Disallow: /ko_KR/
>>> +Disallow: /lt/
>>> +Disallow: /lv/
>>> +Disallow: /my_MM/
>>> +Disallow: /nl/
>>> +Disallow: /no/
>>> +Disallow: /pl/
>>> +Disallow: /pt_BR/
>>> +Disallow: /ro/
>>> +Disallow: /ru/
>>> +Disallow: /sk/
>>> +Disallow: /sl/
>>> +Disallow: /sq/
>>> +Disallow: /sr/
>>> +Disallow: /sr_Latn/
>>> +Disallow: /sv/
>>> +Disallow: /tr/
>>> +Disallow: /uk_UA/
>>> +Disallow: /zh_CN/
>>> +Disallow: /zh_TW/
>>>
>>>  User-Agent: *
>>>  Crawl-Delay: 10
>>>
>>> On Mon, Feb 2, 2015 at 12:59 PM, Aaron McGlinchy
>>> <McGlinchyA at landcareresearch.co.nz> wrote:
>>> > Hi, our instance of CKAN now has datasets showing up in google searches,
>>> > which is great.  However I have noticed that often the link which comes up
>>> > in the google search takes the user to a 'non-default' language version of
>>> > the dataset or resource.  Ie. Our language is English, but a search for
>>> > example for:  house mouse data  returns as the number 1 result one of our
>>> > resources, but with the language as Arabic.  This is perfectly fine if the
>>> > user doing the search is wanting the Arabic language interface, but not
>>> > quite so user friendly if the users wants the English interface.
>>> >
>>> > Is there anything that can be done to influence this behaviour (without
>>> > removing language options that some other users might wish to use)?
>>> >
>>> > Thanks
>>> > Aaron
>>> >
>>> > ________________________________
>>> >
>>> > Please consider the environment before printing this email
>>> > Warning: This electronic message together with any attachments is
>>> > confidential. If you receive it in error: (i) you must not read, use,
>>> > disclose, copy or retain it; (ii) please contact the sender immediately by
>>> > reply email and then delete the emails.
>>> > The views expressed in this email may not be those of Landcare Research
>>> > New Zealand Limited. http://www.landcareresearch.co.nz
>>> > _______________________________________________
>>> > ckan-dev mailing list
>>> > ckan-dev at lists.okfn.org
>>> > https://lists.okfn.org/mailman/listinfo/ckan-dev
>>> > Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>>> _______________________________________________
>>> ckan-dev mailing list
>>> ckan-dev at lists.okfn.org
>>> https://lists.okfn.org/mailman/listinfo/ckan-dev
>>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>
>
>
> --
> Liip AG  // Limmatstrasse 183 //  CH-8005 Zürich
> Tel +41 43 500 39 80 // GnuPG 0x7B588C67 // www.liip.ch
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev



More information about the ckan-dev mailing list