Accessing responder data from a Postgres-like database.
December 11, 2019
Conceptually, you can think of a table in a columnar relational database such as Redshift as similar to a spreadsheet.
The table's schema determines which columns are present, and what the types the values in those columns should take.
A database can enforce that dates are formatted as dates, and much more.
When data is written into the table, rows are created, which you can think of as new rows in a spreadsheet.
While Formsort flows also have schemas, those schemas can change over time, as new questions are added in order to collect new answers. To prevent needing to update the Redshift database columns every time a new answer is collected, answers are written into a single column,
answers, in a data format known as
JSON allows us to express objects that contain multiple values, which are referenced by their keys. So instead of needing multiple columns for the above examples, the single answers column would contain the values all together.
Note that while every row will have the same columns, not every row might have the same set of keys and values. For example, if we start asking users for their birthday, the answer to that question will only be present in the dataset from that point on.
To get answers out of the JSON column, you'd use the standard
SELECT statement that you may be familiar with.
SELECT answers FROM formsort.answer_set
But if you merely select answers, you'll get a blob of text containing the JSON object, which isn't easy to use for direct analysis:
To reach into answers you'll want to use a function called
json_extract_path_text instead, telling the database which key we want from
answers using the second parameter, in this case,
SELECT json_extract_path_text(answers, 'name') AS name FROM formsort.answer_set
You can also select multiple answers out at once by repeating the fields you'd like to select out.
SELECT json_extract_path_text(answers, 'name') AS name, json_extract_path_text(answers, 'age') AS age FROM formsort.answer_set
As mentioned earlier, when new answers are collected in a flow, they will only appear in the
answers column from that point on. The behavior of
json_extract_path_text is to return the empty string when a value is missing, so we can use that in a
WHERE clause to filter out rows where a particular answer is not set.
SELECT json_extract_path_text(answers, 'birthday') AS name, FROM formsort.answer_set WHERE json_extract_path_text(answers, 'birthday') != ''
For more detail, see the Redshift documentation on