[wdmmg-dev] Denormalization and model.mongo.Base.to_ref_dict()

Carsten Senger senger at rehfisch.de
Fri May 13 11:02:06 UTC 2011

On Donnerstag, Mai 12, 2011 23:10:39 +0200 Friedrich Lindenberg
wrote:

Hi,
On Thu, May 12, 2011 at 7:22 PM, Carsten Senger wrote:

I've written code to do the aggregations for the new api's which saves
information on the dataset document after the views where applied, and
that broke a test where to_ref_dict() was used to query an entry. I
fixed it here: <https://bitbucket.org/okfn/wdmmg/changeset/2547a562f441>
and did not find it used that way in real code. This makes me wonder if
we can reduce the amount of data we generate in to_ref_dict() to '_id',
'name', 'label' and 'ref' (maybe 'description' too). This reduces the
number of cases where we would have to update the entries with new
values if e.g. an entity changes, and it reduces the amount of data
stored, indexed and
serialized/deserialized. Looking through the code I found no place where
we would need more informaion, but maybe someone know more.
I'm a big +1 with the added proposal of dereferencing the entities and
classifiers in the solr indexer to still get a fully denormalized form
into the index. This means people can still search for
"to.opencorporates_uri" which I don't think is unrealistic at all.
What do you think?

Oh, you're right. And it's a good suggestion. We can then reduce the data
to 'id', 'name', 'label' and 'ref' (and 'color' as long as we have this
data stored and want to use it for the bubblechart).


