[ddj] Open data murder tracker in Trinidad and Tobago

Gerard Best gerardbest at gmail.com
Tue Apr 8 18:47:22 UTC 2014


WOw who are you? This is such helpful advice. Thanks comrade


On Mon, Feb 3, 2014 at 5:00 PM, Dan Nguyen <dansonguyen at gmail.com> wrote:

> Hi Gerard, cool project and thank you for sharing it. Counting homicides
> isn't easy and the data structure problems you ran into are common to most
> such projects (Slate had a great writeup on tracking general gun deaths:
> http://www.slate.com/articles/news_and_politics/crime/2013/12/newtown_anniversary_what_slate_learned_from_trying_and_failing_to_record.single.html
> )
> To solve your immediate problem, I would *do away with the "Toll"* column.
> That column, as you've realized, will always fluctuate as murders become
> classified/discovered/re-classified. Your spreadsheet should not have any
> set order...because with a spreadsheet, you can reorder by column as you
> wish, and you want to maintain that flexibility.
>
> Now, in terms of creating a unique ID, I think your idea could work fine.
> The key thing is that once you have set that unique ID *it must never
> change*. It should be a value that has no real meaning, i.e. you'll never
> show it outside of this spreadsheet. But ti's the key that you can use for
> your own internal reference's sake.
>
> An easy thing to do is to use this formula:
>
> =CONCATENATE(TEXT(NOW(), "yymmddHHmmss"), "000", RANDBETWEEN(10000,99999))
>
> This will take the current time and turn it into a string, which will be
> unique enough if you enter in entries at a rate *slower* than one a
> second. But just to make sure, we add a random 5 digit number to the end.
>
> Now, this *unique id* should *never change*. So once you've generated
> this number, *copy it, then Paste Special -> Paste Values only*
>
> This will replace the *formula* that was in the cell with just the
> literal value, i.e. that random number.
>
>
>
> That's one way to do it. It doesn't even have to be that complicated, you
> just want a reasonably good way of making sure all the numbers are
> different, and the timestamp is reliable since your computer time (should)
> never repeat itself.
>
>
> *Other tips:*
>
> - *You should maintain a field that simply records when you first entered
> the data*. This is of course different than when the murder actually
> happened. For even better record-keeping, you can have a second column that
> you update with the current date anytime you make a change to an existing
> record.
> *   Tip*: If you click on a cell and then hit *Cmd-;   *the current day
> will be entered into the field
>
> *- You should avoid keeping multiple tabs: *Instead of having separate
> tabs for "unclassified", "classified", "police killings", etc., you should
> put *everything into one sheet* and add a *new column* in which the value
> is either "Classified", "Unclassified", etc. The most important thing about
> this is that it keeps you from having to alter 3-5 different tables every
> time you make a change to the headers. When you just need a certain kind of
> death, you filter by that category column.
>
> - *You should avoid removing rows - *If you take something out because of
> a technicality...again, it's better to make a separate column...maybe
> called *"Confirmed?" *that it is either *true/false. *
>
> It's always easier to filter columns, and *extremely hard* to reconstruct
> data. You want to reduce the amount of clicking you have to do to organize
> data in order to minimize the mistakes you make.
>
>
> Adrian Holovaty, of Django and ChicagoCrime.org, has a great series of
> articles on this topic of data integrity, though he's talking about a much
> larger data set with different constraints. But the insights and advice he
> has may still be useful:
>
> https://source.opennews.org/en-US/learning/sane-data-updates-are-harder-you-think-part-2/#what-do-when-there-no-primary-key
>
>
> Hope that helps!
> - Dan
>
>
>
>
>
>
>
> On Fri, Jan 31, 2014 at 1:09 PM, Gerard Best <gerardbest at gmail.com> wrote:
>
>> Hello all,
>>
>> INTRO
>> I'm on a team working on a open data journalism project called Bullet
>> Points and I'm hoping you can lend your collective experience/expertise to
>> the project.
>>
>> Bullet Points tracks murders in Trinidad and Tobago. The murders are
>> tracked on a Google doc here: http://tinyurl.com/bulletpoints2014
>>
>> The fatal incidents are listed chronologically and are numbered by victim
>> (murder toll). The victims are announced on Twitter (
>> twitter.com/bulletpoints_) by their toll number. This system has some
>> serious limitations.
>>
>> PROBLEM
>> Firstly, it is geared toward tracking the murder victims but is an
>> inadequate way of tracking the actual killings themselves. For example,
>> when there is a double murder, the toll goes up by two but the number of
>> incidents goes up by only one.
>>
>> Secondly, killings do not always occur in the same chronological order
>> that the victims' bodies are discovered. Sometimes bodies of murder victims
>> are discovered after several days. When this happens, the relevant killing
>> must be inserted somewhere in the middle of the existing chronological
>> list, which obviously changes the toll count of all subsequent victims.
>> Unfortunately, because our current system relies on the toll count to
>> identify the victim, it has now run into this serious limitation.
>>
>> PROPOSED SOLUTION
>> I think that what I need to do is add another column that contains a
>> unique reference number for each incident. The unique reference number can
>> follow a standard protocol such as [DATE][INTEGER]. For example, the most
>> recent murder, which was discovered today, would have a reference number
>> 2014013001. I think this will deal with the problem but I am not sure it
>> is the best solution.
>>
>> Thanks for taking the time to read this. If you think you can help, I
>> would be grateful for your perspective on this problem and proposed
>> solution.
>>
>> --
>> ---
>> Gerard Best
>>
>> _______________________________________________
>> data-driven-journalism mailing list
>> data-driven-journalism at lists.okfn.org
>> https://lists.okfn.org/mailman/listinfo/data-driven-journalism
>> Unsubscribe:
>> https://lists.okfn.org/mailman/options/data-driven-journalism
>>
>>
>
> _______________________________________________
> data-driven-journalism mailing list
> data-driven-journalism at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/data-driven-journalism
> Unsubscribe: https://lists.okfn.org/mailman/options/data-driven-journalism
>
>


-- 
---
Gerard Best
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/data-driven-journalism/attachments/20140408/47b350bc/attachment-0003.html>


More information about the data-driven-journalism mailing list