Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion 02_activities/assignments/Assignment1.md
Original file line number Diff line number Diff line change
Expand Up @@ -205,5 +205,8 @@ Consider, for example, concepts of fariness, inequality, social structures, marg


```
Your thoughts...
Databases and data systems are deeply integrated into our daily lives, shaping how we interact with the world, especially in an era where technology is so present. Data systems reflect the values, biases, and assumptions of their creators and, as a result, often carry built-in biases. In my everyday life, I encounter databases in various forms, from targeted advertisements on the internet to credit checks when applying for loans. As the world becomes increasingly interconnected through the internet, the scale of data creation and usage has grown immensely. This expansion demands that we remain cognisant about how we use data to avoid discriminating against marginalized groups and perpetuating systemic biases.
Value systems are embedded in many different databases and can often have overlooked negative effects. Some examples of this include data systems like Canada’s healthcare and tax records that should aim to serve all citizens equitably. Unfortunately, these databases do not accurately represent the full diversity of our population. Healthcare databases frequently categorize gender as strictly male or female, excluding non-binary and transgender individuals from accurate representation. Similarly, tax and benefits systems are built around traditional family structures, which can disadvantage single parents or those in non-conventional family arrangements. These oversights reveal how outdated value systems are limiting fairness within these systems. Social structures are further reinforced by data systems in subtle but significant ways. Employment databases and hiring platforms, for example, often prioritize conventional career trajectories, penalizing women who have taken time off to raise their families or people who have faced unexpected hardships throughout their lives. It can be hard for these people who have followed non-linear career paths to have success when being evaluated using algorithms trained on databases containing data from traditional value systems. Automated hiring also often favour male candidates when trained on data from industries historically dominated by men. Similarly, educational databases, such as standardized testing systems, tend to favour students from higher socio-economic backgrounds, perpetuating existing cycles. Many educational databases reward the already privileged and fail to account for the structural barriers faced by marginalized groups. These examples highlight how technological systems trained on biased databases are active participants in shaping and perpetuating societal values.
Although overt discrimination may no longer be legal, the embedding of outdated value systems in databases still occurs and can have grave effects on certain populations. Policing and justice systems often rely heavily on historical crime data that disproportionately target marginalized communities, particularly indigenous and black populations in Canada. Predictive policing tools trained on such data perpetuate cycles of surveillance and discrimination, embedding systemic inequities into the very algorithms that claim to be objective. Similarly, financial systems use databases to generate credit scores that continue to reflect historical biases against women and other underrepresented groups. Perhaps one of the most significant issues with databases is their failure to fully account for certain groups, rendering them invisible within these systems and therefore not accurately represented. Databases in fields like science and technology frequently underrepresent women, reinforcing the misconception that they are less capable or interested in these areas. This invisibility not only perpetuates inequities but also hinders progress by failing to capture the full diversity of human experience.

```
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
181 changes: 169 additions & 12 deletions 02_activities/assignments/assignment1.sql
Original file line number Diff line number Diff line change
Expand Up @@ -2,24 +2,57 @@
/* SECTION 2 */



--SELECT
/* 1. Write a query that returns everything in the customer table. */


SELECT
*
FROM
customer

/* 2. Write a query that displays all of the columns and 10 rows from the cus- tomer table,
sorted by customer_last_name, then customer_first_ name. */


SELECT
customer_id,
customer_first_name,
customer_last_name,
customer_postal_code
FROM
customer
ORDER BY
customer_last_name,
customer_first_name
LIMIT 10;

--WHERE
/* 1. Write a query that returns all customer purchases of product IDs 4 and 9. */
-- option 1

SELECT
*
FROM
customer_purchases
WHERE
product_id
IN
(4, 9);

-- option 2


SELECT
*
FROM
customer_purchases
WHERE
product_id
=
4
OR
product_id
=
9;

/*2. Write a query that returns all customer purchases and a new calculated column 'price' (quantity * cost_to_customer_per_qty),
filtered by vendor IDs between 8 and 10 (inclusive) using either:
Expand All @@ -28,47 +61,132 @@ filtered by vendor IDs between 8 and 10 (inclusive) using either:
*/
-- option 1

SELECT
*,
(quantity * cost_to_customer_per_qty)
AS
price
FROM
customer_purchases
WHERE
vendor_id >= 8
AND
vendor_id <= 10;

-- option 2


SELECT
*,
(quantity * cost_to_customer_per_qty)
AS
price
FROM
customer_purchases
WHERE
vendor_id
BETWEEN
8 AND 10

--CASE
/* 1. Products can be sold by the individual unit or by bulk measures like lbs. or oz.
Using the product table, write a query that outputs the product_id and product_name
columns and add a column called prod_qty_type_condensed that displays the word “unit”
if the product_qty_type is “unit,” and otherwise displays the word “bulk.” */


SELECT
product_id,
product_name,
CASE
WHEN
product_qty_type = 'unit'
THEN 'unit'
ELSE 'bulk'
END AS prod_qty_type_condensed
FROM
product;

/* 2. We want to flag all of the different types of pepper products that are sold at the market.
add a column to the previous query called pepper_flag that outputs a 1 if the product_name
contains the word “pepper” (regardless of capitalization), and otherwise outputs 0. */


SELECT
product_id,
product_name,
CASE
WHEN
LOWER(product_name)
LIKE '%pepper%'
THEN 1
ELSE 0
END AS pepper_flag
FROM
product;

--JOIN
/* 1. Write a query that INNER JOINs the vendor table to the vendor_booth_assignments table on the
vendor_id field they both have in common, and sorts the result by vendor_name, then market_date. */



SELECT
vendor.*,
vendor_booth_assignments.*

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one * would return all the columns from both tables.

FROM vendor
INNER JOIN
vendor_booth_assignments
ON
vendor.vendor_id = vendor_booth_assignments.vendor_id
ORDER BY
vendor.vendor_name,
vendor_booth_assignments.market_date;

/* SECTION 3 */

-- AGGREGATE
/* 1. Write a query that determines how many times each vendor has rented a booth
at the farmer’s market by counting the vendor booth assignments per vendor_id. */


SELECT
vendor.vendor_id,
vendor.vendor_name,
COUNT
(vendor_booth_assignments.vendor_id) AS booth_rentals
FROM
vendor
INNER JOIN
vendor_booth_assignments
ON
vendor.vendor_id = vendor_booth_assignments.vendor_id
GROUP BY
vendor.vendor_id,
vendor.vendor_name
ORDER BY
booth_rentals DESC;

/* 2. The Farmer’s Market Customer Appreciation Committee wants to give a bumper
sticker to everyone who has ever spent more than $2000 at the market. Write a query that generates a list
of customers for them to give stickers to, sorted by last name, then first name.

HINT: This query requires you to join two tables, use an aggregate function, and use the HAVING keyword. */


SELECT
customer.customer_id,
customer.customer_last_name,
customer.customer_first_name,
SUM(customer_purchases.quantity * customer_purchases.cost_to_customer_per_qty) AS total_spent
FROM
customer
INNER JOIN
customer_purchases
ON
customer.customer_id = customer_purchases.customer_id
GROUP BY
customer.customer_id,
customer.customer_last_name,
customer.customer_first_name
HAVING
total_spent > 2000
ORDER BY
customer.customer_last_name,
customer.customer_first_name;

--Temp Table
/* 1. Insert the original vendor table into a temp.new_vendor and then add a 10th vendor:
Expand All @@ -82,19 +200,58 @@ When inserting the new vendor, you need to appropriately align the columns to be
VALUES(col1,col2,col3,col4,col5)
*/


CREATE
TEMPORARY TABLE new_vendor AS
SELECT
*
FROM
vendor

INSERT INTO
new_vendor
(
vendor_id,
vendor_name,
vendor_type,
vendor_owner_first_name,
vendor_owner_last_name
)
VALUES
(
10,
'Thomass Superfood Store',
'Fresh Focused',
'Thomas',
'Rosenthal'
);

-- Date
/*1. Get the customer_id, month, and year (in separate columns) of every purchase in the customer_purchases table.

HINT: you might need to search for strfrtime modifers sqlite on the web to know what the modifers for month
and year are! */


SELECT
customer_id,
STRFTIME('%m', market_date) AS month,
STRFTIME('%Y', market_date) AS year
FROM
customer_purchases;

/* 2. Using the previous query as a base, determine how much money each customer spent in April 2022.
Remember that money spent is quantity*cost_to_customer_per_qty.

HINTS: you will need to AGGREGATE, GROUP BY, and filter...
but remember, STRFTIME returns a STRING for your WHERE statement!! */

SELECT
customer_id,
SUM(quantity * cost_to_customer_per_qty) AS total_spent
FROM
customer_purchases
WHERE
STRFTIME('%m', market_date) = '04'
AND
STRFTIME('%Y', market_date) = '2022'
GROUP BY
customer_id;