i’m using mongodb database keep track of analytics application. i’m writing clojure application (using clj-time , monger) data out of database.
i have collection containing records like
{"_id": objectid(...), timestamp: isodate("2013-06-01t15:18:37z"), device: "04dbf04b6dc0d0a4fd383967b3dc62f50111e07e"}
each different device
represents different user of service. i’d find out how many (unique) users have each day, caveat i’d “day” refer us/central time zone, taking daylight saving account. (if weren’t requirement think $group
, distinct
.)
here’s i’ve been doing:
(ns analytics.reporting (:use [monger.core :only [connect! connect set-db! get-db]] monger.operators clj-time.core clj-time.periodic clj-time.format) (:require [monger.collection :as mc])) (defn to-central [dt] (from-time-zone dt (time-zone-for-id "america/chicago"))) (defn count-distinct [coll] (count (distinct coll))) (defn daily-usage [ndays] (let [midnights (map to-central (reverse (for [offset (map days (range ndays))] (minus (to-central (today-at 0 0)) offset)))) by-day (for [midnight midnights] (mc/find-maps "devices" {:timestamp {$gte midnight $lt (plus midnight (days 1))}})) devices-by-day (map #(map :device %) by-day) distinct-devices-by-day (map count-distinct devices-by-day)] distinct-devices-by-day))
if can’t read clojure, says: list of recent n midnights in central time zone, , run mongo queries find of records between each successive pair of midnights. then, count number of distinct device
s within each day.
here’s don’t approach:
- running separate query each day (i @ 30 days @ time) feels wrong; should done on database side instead of application side.
- counting distinct
device
s should done database. - my server set utc time zone, if it’s after midnight in utc before midnight in central time, last entry in list zero. easy enough patch over, i’d prefer solution smart enough prevent in first place.
- this entire function takes 500ms run. isn’t awful—i’m 1 runs query, , once or twice per day—but seems operation shouldn’t take long.
is there way can shove more of logic mongodb query?
as suggested @wiredprairie, ended including central-time date in each record added database. able use trivial $group
query gather number of records each date.
Comments
Post a Comment