Visualize Your Google Search Queries With Python and D3

PUBLISHED ON DEC 16, 2017

Ever wonder if google knows more about you than you do? This got me thinking. Maybe we can use google to learn something about ourselves, how we’ve evolved, and appreciate what we’ve accomplished in a world where we seldom take the time to look back on what we’ve done.

And the end of this tutorial, you will have an interactive map of all your google search queries that’ll look like this:

This is an image

You can download your data here. Click on “select all” to deselect all, scroll down and select “searches”. Even though it’s a lot of search queries, they are all strings so the file shouldn’t be all that large. for example, I’ve used google 79,507 times over the last 8 years and my file was less than 2mb (zipped) and 9mb (unzipped).

Now that you have the data, let’s parse it in python. Open a terminal and cd to the Takeout directory downloaded from google. Start by creating a new file called “parse_google.py”

touch parse_google.py

Open it in your favorite text editor (mine is sublime 3).

subl parse_google.py

Write headers.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

Import dependencies.

import os
import json
import datetime
import operator
import matplotlib.pyplot as plt

Walk the Searches file and write them all to all_searches.json.

if not os.path.isfile('./searches/all_searches.json'):
  result = []
  for f in os.listdir('./searches'):
    if f.endswith('.json'):
      path = './searches/' + f
      with open(path, "rb") as infile:
          result.append(json.load(infile))

  with open("./searches/all_searches.json", "wb") as outfile:
    json.dump(result, outfile)

Parse searches and write them to a dictionary for fast access.

total_searches = 0
dates_and_queries = {}
with open('./searches/all_searches.json') as f:
  data = json.load(f)
  for i in data:
    if i.get('event'):
      events = i['event']
      for event in events:
        query = event['query']
        query_text = query['query_text']
        timestamp = query['id'][0]['timestamp_usec']
        timestamp = int(timestamp) / 1000000
        date = datetime.datetime.fromtimestamp(timestamp).strftime('%Y/%m/%d')

        if dates_and_queries.get(date):
          dates_and_queries[date].append(query_text)
        else:
          dates_and_queries[date] = [query_text]

        total_searches += 1

Google does not include data for days you do not use google so we need to add them in. first we write a utility function to create a day window.

def create_day_window(start, finish):
  date_window = []
  delta = finish - start
  for i in range(delta.days + 1):
    day = start + datetime.timedelta(days=i)
    day = day.strftime('%Y/%m/%d')
    date_window.append(day)
  return date_window

Reformat variables and call the create_day_window function.

start = min(dates_and_queries, key=str)
finish = max(dates_and_queries, key=str)

start_year, start_month, start_day = start.split('/')
finish_year, finish_month, finish_day = finish.split('/')

start = datetime.date(int(start_year), int(start_month), int(start_day))
finish = datetime.date(int(finish_year), int(finish_month), int(finish_day))

day_window = create_day_window(start,finish)

Now we iterate through our array checking for dates and setting the value to 0.

for day in day_window:
  if not dates_and_queries.get(day):
    dates_and_queries[day] = []

Finally, we sort our date and search queries and write them to a file called data.csv.

sorted_dates_and_queries = sorted(dates_and_queries.items(), key=operator.itemgetter(0))
text_queries = [i[1] for i in sorted_dates_and_queries]
number_of_searches = [len(i) for i in text_queries]

with open('data.csv', 'w') as f:
  f.write('date,searches,queries\n') 
  for i in sorted_dates_and_queries:
    date = i[0]
    searches = len(i[1])
    queries = ', '.join(i[1]).encode('utf8')
    f.write('{},{},"{}"\n'.format(date,searches,queries))

  f.close()    

The full parse_google.py script is available in github.

Time to run the python script! Go back to the terminal and run it!

python parse_google.py

Use the terminal to check to see you have the data.csv file.

ls

And check the last 10 searches.

tail data.csv

Now you can open the csv file and look around, which is fun, but why not map it using d3, which is an incredible open source JavaScript library for producing dynamic, interactive data visualizations in web browsers. You can support their project by purchasing a sticker here.

First thing, let’s start by removing the index.html file from our Takeout folder.

rm index.html

And create our own.

touch index.html

Open it and we’ll get started with d3.

subl index.html

First let’s write the html and css.

body { font: 12px Arial;}

path { 
    stroke: steelblue;
    stroke-width: 2;
    fill: none;
}

.axis path,
.axis line {
    fill: none;
    stroke: grey;
    stroke-width: 1;
    shape-rendering: crispEdges;
}

.tooltip {
  position: absolute;
  width: 200px;
  height: 28px;
  pointer-events: none;
  color: black;
  font-weight: bold;
}

.datetip {
  position: absolute;
  width: 200px;
  height: 28px;
  pointer-events: none;
  color: black;
  font-weight: bold;
}

.queries {
  position: absolute;
  width: 900px;
  height: 200px;
  pointer-events: none;
  color: black;
  font-weight: bold;
}

Now lets add the javascript. We’ll be using d3 so start by loading it’s latest version.

<script src="//d3js.org/d3.v4.min.js"></script>

Set the dimensions of the canvas / graph.

var margin = {top: 30, right: 20, bottom: 30, left: 50},
  width = 1400 - margin.left - margin.right,
  height = 600 - margin.top - margin.bottom;

Parse the date / time.

var parseDate = d3.timeParse("%Y/%m/%d");

Set the ranges.

var x = d3.scaleTime().range([0, width]);
var y = d3.scaleLinear().range([height, 0]);

Define the axes.

var xAxis = d3.axisBottom(x)
  .ticks(10);
var yAxis = d3.axisLeft(y)
  .ticks(5);

Define the line.

var valueline = d3.line()
  .x(function(d) { return x(d.date); })
  .y(function(d) { return y(d.searches); });

Add the svg canvas.

var svg = d3.select("body")
  .append("svg")
    .attr("width", width + margin.left + margin.right)
    .attr("height", height + margin.top + margin.bottom)
  .append("g")
    .attr("transform", 
      "translate(" + margin.left + "," + margin.top + ")");

Set the variables.

var totalQueries = d3.select("body").append("div")
  .attr("class", "totalQueries");

var startDate = d3.select("body").append("div")
  .attr("class", "startDate");

var tooltip = d3.select("body").append("div")
  .attr("class", "tooltip");

var datetip = d3.select("body").append("div")
  .attr("class", "datetip")
  .style("opacity", 0);

var queries = d3.select("body").append("div")
  .attr("class", "queries")

var color = d3.scaleOrdinal(d3.schemeCategory20);

var timeFormat = d3.timeFormat('%B %d, %Y');

Now load the data.

d3.csv("data.csv", function(error, data) {
  var searches = 0

  data.forEach(function(d) {
    d.date = parseDate(d.date);
    d.searches = +d.searches;
    searches += d.searches
  });

Scale the range of the data.

x.domain(d3.extent(data, function(d) { return d.date; }));
y.domain([0, d3.max(data, function(d) { return d.searches; })]);

Add the scatterplot.

svg.selectAll("dot")
  .data(data)
  .enter().append("circle")
    .attr("fill",function(d,i){return color(i);})
    .attr("r", 2.5)
    .attr("cx", function(d) { return x(d.date); })
    .attr("cy", function(d) { return y(d.searches); })
    .on("mouseover", function(d) {
      tooltip.html("Daily Queries: " + d.searches)
        .style("left", "100px")
        .style("top", "675px");
      datetip.transition()
        .duration(100)
        .style("opacity", .9);
      datetip.html(timeFormat(d.date))
        .style("left", (d3.event.pageX + 5) + "px")
        .style("top", (d3.event.pageY - 28) + "px");
      queries.html(d.queries)
        .style("left", "100px")
        .style("top", "700px");
    })
  .on("mouseout", function(d) {
    datetip.transition()
      .duration(500)
      .style("opacity", 0);
  });

Add the X and Y Axis.

svg.append("g")
  .attr("class", "x axis")
  .attr("transform", "translate(0," + height + ")")
  .call(xAxis);

svg.append("g")
  .attr("class", "y axis")
  .call(yAxis);        

And Show the start date and total queries.

startDate.html("Start Date: " + timeFormat(data[0].date))
  .style("left", "100px")
  .style("top", "625px");

totalQueries.html("Total Searches: " + searches)
  .style("left", "100px")
  .style("top", "650px");

Save the file and lets set up a local server so no one can see your weird searches. The full index.html file is available in github.

Now you need to set up a run a local server in your takeout folder so lets use python’s SimpleHTTPServer.

python -m SimpleHTTPServer

Finally, open your favorite web browser and go to http://localhost:8000/.

That’s it! Enjoy!