This week I spent far too much of my spare time trying to track down where on earth I was "leaking" memory in my trivial clojure code.
Parsing was fairly easy - a neat sequence of reading the file line
by line, splitting it up, cleaning up the columns, parsing
integers, etc. Thanks to lazy sequences and the
->> macro this all
looks almost trivial:
(defn load-data-file "General (for any geoplanet data file) loading file function" ([file-name] ;; load without any mappings (load-data-file file-name )) ([file-name mappings] (binding [duck-streams/*default-encoding* "UTF-8"] (->> file-name duck-streams/read-lines (map (fn [line] (str-utils2/split line #"\t"))) (map (fn [line] (map maybe-strip line))) (parse-data-file file-name) (map (fn [dict] (reduce (fn [dict mapper] (mapper dict)) dict mappings)))))))
There are some extra functions wrapping these to read the three types of files (places, adjacencies, and aliases) but that is irrelevant for the discussion here.
Writing data is also trivial thanks to congomongo. We partition the
data we read into larger batches to push into the DB and also give
some visual feedback as this will take ages (ignore the
munge-place which only adds some extra fields I want):
(defn store-places [places-list collection-name] (doseq [some-places (partition 1000 1000  (map munge-place places-list))] (congo/mass-insert! collection-name some-places) (print "."))))
However, in combination
"7.4.1/uk_places.tsv" false) "uk") this soon blew up with an
exhausted heap exception. However the equivalent
(doseq [x (partition 1000 1000  (map munge-place-to-dict (load-places-file "7.4.1/uk_places.tsv" false)))] (congo/mass-insert! "uk" x) (print "."))
I guess the output of
load-places-file is a lazy sequence and the
store-places function holds onto the head of that sequence ( it is the
places-list parameter) thus not allowing it to be garbage collected while
we iterate over it. Took me ages to find this out. The first urge to
factor your code into nice small bits is not always the best way to go.