[ckan-discuss] Despamming datahub

Mark Wainwright mark.wainwright at okfn.org
Tue Mar 19 12:45:34 GMT 2013


Sure, but in combination with whatever else is in place. It's a very
great deal less work than at present.

Mark

On 19/03/2013, Ross Jones <ross at servercode.co.uk> wrote:
> I think that's a lot of work for someone to take on. Minimising the amount
> of work involved in keeping the site clean should be what we're aiming for.
>
> Ross
>
> From: Mark Wainwright
> Sent: Tuesday, 19 March 2013 12:39 PM
> To: Ross Jones
> Cc: Velichka Dimitrova; CKAN discuss
> Subject: Re: [ckan-discuss] Despamming datahub
>
> Here's a suggestion I made about spam control a while ago. Spam users
> are heuristically identified, and an admin user can get a list of a
> screenful of them at a time (like your spam mail folder). You check or
> uncheck the tick boxes to confirm which ones are spammers and hit the
> big red 'nuke' button. It deletes and purges the spam users and all
> their created groups and datasets, and undoes any revisions they've
> made elsewhere (as long as the dataset hasn't been edited later).
>
> What do you think?
>
> Mark
>
> On 19/03/2013, Ross Jones <ross at servercode.co.uk> wrote:
>> Definitely think a demo account with a simple password is a good plan.
>> Setting up the demo account should be straight-forward, and running a task
>> to delete the data every 4 hours (where that data is more than 4 hours
>> old)
>> equally so. The verified accounts would be the only potential stumbling
>> block to doing this. Also, I'd love to see a feature where accounts can
>> have logins enabled/disabled as well, to make it easier to manage.
>>
>> Ross.
>>
>>
>> On Tue, Mar 19, 2013 at 10:26 AM, Velichka Dimitrova <
>> velichka.dimitrova at okfn.org> wrote:
>>
>>> I'd also support some higher entry barrier for long-term deposits of
>>> metadata or data.
>>>
>>> In terms of being open - I think it is valuable to have a site which
>>> users
>>> can access and upload data or links to straight away in order to see what
>>> the possible features and benefits are. Maybe this could be some demo
>>> feature - where such "experimental uploads" are deleted after several
>>> hours.
>>>
>>> For groups and communities which are using / would like to use Datahub as
>>> an actual portal and metadata repository, some user verification process
>>> should not be a problem, it would not reduce openness in my opinion.
>>>
>>> Velichka Dimitrova
>>> Open Economics Project Coordinator
>>> Open Knowledge Foundation
>>> http://okfn.org | http://openeconomics.net
>>>
>>>
>>>
>>>
>>> On 19 March 2013 10:17, Ross Jones <ross at servercode.co.uk> wrote:
>>>
>>>> No problem, I wrote some code to do it. It isn't a perfect solution, a
>>>> lot of users creating these groups need to be deleted, and I am nervous
>>>> about doing that without better heuristics than their choice of email
>>>> provider.
>>>>
>>>> I think it might be better to force users through either an email
>>>> verification process, or to provisionally put their first group/dataset
>>>> creation into a pending state for moderation. I realise this would
>>>> reduce
>>>> how open the site is, but I can't see another viable way to reduce all
>>>> the
>>>> spam.
>>>>
>>>> Ross
>>>>
>>>>
>>>>
>>>> On Tue, Mar 19, 2013 at 9:22 AM, Velichka Dimitrova <
>>>> velichka.dimitrova at okfn.org> wrote:
>>>>
>>>>> Excellent work, Ross! I agree with Mark.
>>>>>
>>>>> Thank you very much.
>>>>>
>>>>> Velichka Dimitrova
>>>>> Open Economics Project Coordinator
>>>>> Open Knowledge Foundation
>>>>> http://okfn.org | http://openeconomics.net
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 18 March 2013 15:28, Mark Wainwright
>>>>> <mark.wainwright at okfn.org>wrote:
>>>>>
>>>>>> This is great, Ross! Thanks for this. The spam groups have been
>>>>>> disfiguring the DataHub for a while. Now the groups page looks much
>>>>>> happier:
>>>>>>
>>>>>> http://datahub.io/group
>>>>>>
>>>>>> >> Would it be possible for someone to turn off group-creation until
>>>>>> the
>>>>>> >> datahub gets migrated to 2.0? I guess any urgently needed groups
>>>>>> could
>>>>>> >> approach the list of ask in the meantime.
>>>>>>
>>>>>> Hopefully one of the devs will step in ...
>>>>>>
>>>>>> Mark
>>>>>>
>>>>>>
>>>>>> On 18/03/2013, Ross Jones <ross at servercode.co.uk> wrote:
>>>>>> > Hi Sara,
>>>>>> >
>>>>>> > It shouldn't be an issue on 1.x, it depends how open you want it to
>>>>>> be, but
>>>>>> > as it is probably best to be working with 2.0 now...
>>>>>> >
>>>>>> > It should certainly be less of an issue on 2.0, unless you leave it
>>>>>> open to
>>>>>> > anybody to create a group. I guess ideally for open systems it
>>>>>> should be
>>>>>> > possible to have the group go into a pending state for verification
>>>>>> before
>>>>>> > it was created. I haven't checked whether this is the case or not
>>>>>> yet.
>>>>>> > Even if not in core it should be possible to do this in an
>>>>>> > extension.
>>>>>> >
>>>>>> > Ross
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > On Mon, Mar 18, 2013 at 2:38 PM, Sara Farmer
>>>>>> > <sara.farmer at btinternet.com>wrote:
>>>>>> >
>>>>>> >> Phew... Does that mean that spammers using fake groups won't be
>>>>>> >> an
>>>>>> >> issue in 2.0?
>>>>>> >>
>>>>>> >> I'm asking because I just set up my own CKAN node and am clunking
>>>>>> >> my
>>>>>> way
>>>>>> >> through getting it working for me... I locked it down because I saw
>>>>>> all
>>>>>> >> the
>>>>>> >> spam on the datahub, and wasn't quite sure what the best way to
>>>>>> avoid the
>>>>>> >> same fate was (and as the receipient of more than one "we've closed
>>>>>> your
>>>>>> >> site because of spammers" message from providers in the past, this
>>>>>> scares
>>>>>> >> me a little more than most).
>>>>>> >>
>>>>>> >> Thanks,
>>>>>> >>
>>>>>> >>
>>>>>> >> Sj.
>>>>>> >>
>>>>>> >>
>>>>>> >> On 3/18/2013 10:04 AM, Ross Jones wrote:
>>>>>> >>
>>>>>> >> Hi,
>>>>>> >>
>>>>>> >> I've started de-spamming the datahub, there were *lots* of fake
>>>>>> groups
>>>>>> >> created, and as there are some fairly easy heuristics in
>>>>>> >> identifying
>>>>>> them
>>>>>> >> (thanks hotmail) I've written a script that'll mark them all as
>>>>>> deleted.
>>>>>> >>
>>>>>> >> I'm only soft-deleting them (just in case) and unfortunately users
>>>>>> don't
>>>>>> >> have that option so I've erred on the side of caution and
>>>>>> >> temporarily
>>>>>> >> left
>>>>>> >> them (until I can come up with a safer set of rules).
>>>>>> >>
>>>>>> >> Would it be possible for someone to turn off group-creation until
>>>>>> the
>>>>>> >> datahub gets migrated to 2.0? I guess any urgently needed groups
>>>>>> could
>>>>>> >> approach the list of ask in the meantime.
>>>>>> >>
>>>>>> >> Ross
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> _______________________________________________
>>>>>> >> ckan-discuss mailing
>>>>>> >> listckan-discuss at lists.okfn.orghttp://
>>>>>> lists.okfn.org/mailman/listinfo/ckan-discuss
>>>>>> >> Unsubscribe: http://lists.okfn.org/mailman/options/ckan-discuss
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> No virus found in this message.
>>>>>> >> Checked by AVG - www.avg.com
>>>>>> >> Version: 2013.0.2904 / Virus Database: 2641/6183 - Release Date:
>>>>>> 03/16/13
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> _______________________________________________
>>>>>> >> ckan-discuss mailing list
>>>>>> >> ckan-discuss at lists.okfn.org
>>>>>> >> http://lists.okfn.org/mailman/listinfo/ckan-discuss
>>>>>> >> Unsubscribe: http://lists.okfn.org/mailman/options/ckan-discuss
>>>>>> >>
>>>>>> >>
>>>>>> >
>>>>>>
>>>>>> _______________________________________________
>>>>>> ckan-discuss mailing list
>>>>>> ckan-discuss at lists.okfn.org
>>>>>> http://lists.okfn.org/mailman/listinfo/ckan-discuss
>>>>>> Unsubscribe: http://lists.okfn.org/mailman/options/ckan-discuss
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ckan-discuss mailing list
>>>>> ckan-discuss at lists.okfn.org
>>>>> http://lists.okfn.org/mailman/listinfo/ckan-discuss
>>>>> Unsubscribe: http://lists.okfn.org/mailman/options/ckan-discuss
>>>>>
>>>>>
>>>>
>>>
>>
>



More information about the ckan-discuss mailing list