Tropology performance: PostgreSQL vs Neo4j

Or, apples vs. papaya salad.

Short version: I’ve moved Tropology to PostgreSQL for performance reasons, and because after some evaluation, Neo4j wasn’t as good a fit as it seemed at first blush.

You may want to start by catching up with the other Tropology articles.

All done?

Visualizing TVTropes - Part 4: Merges

Episode IV: A new approach

We got better performance with the last changes I made to the import process, at the cost of parallelization and a significantly less clean Clojure codebase.

As I was going about wrapping them up, I got a recommendation from Michael Hunger. Summarized:

  • Don’t use the batch REST API
  • The Cypher query endpoint is outdated
  • Try merges

Well then. So much for that feature branch.

Visualizing TVTropes - Part 3: Proper batching

The story so far

Yesterday we were discussing our (admittedly) somewhat ghetto, not-quite-batched Neo4j implementation.

The long and short of it is that I was initially attacking the import process as one would on a JDBC client for a relational database. Query for these values here, create those if some don’t exist, insert relationships here, etc.

That seems to be woefully inefficient in Neo4j.

Visualizing TVTropes - Part 2: Neo4j optimizations

Welcome. You may want to first catch up on Part 1 of this little experiment.

All done? OK then.

Visualizing TVTropes - Part 1

Introduction

This is the first part of a series of articles on a small experiment I’m building. The intent is to crawl TVTropes, to find possible relationships between tropes and the material they appear in, as well as shared concepts.

I want to be able to not only visualize a concept and the tropes that it links to, but also how they relate to each other. A bonus would be being able to query how far away a concept is from another - for instance, how many steps we need to take before we can go from Cowboy Bebop to Macross Missile Massacre.