diff --git a/02_activities/assignments/Assignment2.md b/02_activities/assignments/Assignment2.md index a95a027fd..acb64b6cf 100644 --- a/02_activities/assignments/Assignment2.md +++ b/02_activities/assignments/Assignment2.md @@ -45,16 +45,29 @@ There are several tools online you can use, I'd recommend [Draw.io](https://www. **HINT:** You do not need to create any data for this prompt. This is a conceptual model only. +##### Answer: + + #### Prompt 2 We want to create employee shifts, splitting up the day into morning and evening. Add this to the ERD. +##### Answer: + + #### Prompt 3 The store wants to keep customer addresses. Propose two architectures for the CUSTOMER_ADDRESS table, one that will retain changes, and another that will overwrite. Which is type 1, which is type 2? **HINT:** search type 1 vs type 2 slowly changing dimensions. +##### Answer: + + ``` -Your answer... +From my research, a type 1 model would overwrite the old customer address with the new one, while a type 2 model would retain changes. +In my opinion, in a bookstore database, it'd be more useful to overwrite the old addresses with new ones since keeping data that is not +utilized would be redundant. However, if address types are not being considered (i.e., home, billing or shipping), a type 1 model might +results in a loss of important data. In my model, I have incorporated another table to define address types -- this can come handy if a +customer's home and billing addresses are different, and the correct address type can be updated when needed. ``` *** @@ -182,5 +195,20 @@ Consider, for example, concepts of labour, bias, LLM proliferation, moderating c ``` -Your thoughts... +The assigned article discusses the limitations of automatization and reliance on human labour to train neural nets. It is a +well-known fact that major fast fashion brands rely on human workers who have very low wages and work in oppressive conditions. +On top of conditions that make it difficult to automatize sewing due to a lack of dexterity from robots and challenges to train +automated models to keep up with new styles, the current system prefers this exploitative approach to maximize profits. Further, +training datasets and reliable human coding of these datasets are imperative to building neural nets like large language models (LLM). +The performance and outputs of these models are as good as their training datasets. If the implicit racial bias and sexist +stereotypes that the trainers hold impact the coding of the training datasets, this introduces implicit biases into the models that +utilize the training sets. For example, in 2019, Google's Vision AI was the face of online discourse when the model was labelling +a hand-held device differently based on skin tone -- if a Black person was holding the item, it was labelled as a "gun"; in contrast, +it was labelled as a "monocular" when a white person was holding it. Google Translate is another example, where translating from +a gender-neutral language such as Turkish to English results in a stereotypical generalization of professions (i.e., a sentence +referring to a doctor uses he/him pronouns when the input sentence does not indicate gender). Overall, minimizing bias and bigotry +in technology and automated models boils down to a need for people to acknowledge their implicit biases and address them through +further discussions and better education. + +On a side note, the robot attempting to fold a towel really mirrors my struggles putting a duvet cover on. ``` diff --git a/02_activities/assignments/assignment2.sql b/02_activities/assignments/assignment2.sql index 5ad40748a..03a0f27ee 100644 --- a/02_activities/assignments/assignment2.sql +++ b/02_activities/assignments/assignment2.sql @@ -20,7 +20,9 @@ The `||` values concatenate the columns into strings. Edit the appropriate columns -- you're making two edits -- and the NULL rows will be fixed. All the other rows will remain the same.) */ - +SELECT +product_name || ', ' || COALESCE(product_size,' ') || ' (' || COALESCE(product_qty_type, 'unit') || ')' +FROM product; --Windowed Functions /* 1. Write a query that selects from the customer_purchases table and numbers each customer’s @@ -32,17 +34,39 @@ each new market date for each customer, or select only the unique market dates p (without purchase details) and number those visits. HINT: One of these approaches uses ROW_NUMBER() and one uses DENSE_RANK(). */ +-- option with row_number +SELECT +customer_id +,market_date +,ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY market_date) AS num_of_visits +FROM customer_purchases; /* 2. Reverse the numbering of the query from a part so each customer’s most recent visit is labeled 1, then write another query that uses this one as a subquery (or temp table) and filters the results to only the customer’s most recent visit. */ - +SELECT +customer_id +,market_date +FROM ( + SELECT + customer_id + ,market_date + ,ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY market_date DESC) AS most_recent_visit + FROM customer_purchases +) x +WHERE most_recent_visit = 1; /* 3. Using a COUNT() window function, include a value along with each row of the customer_purchases table that indicates how many different times that customer has purchased that product_id. */ +SELECT DISTINCT +customer_id +,product_id +,COUNT() OVER (PARTITION BY customer_id, product_id) AS customer_purchase_count +FROM customer_purchases +ORDER BY customer_id, product_id; -- String manipulations @@ -57,10 +81,22 @@ Remove any trailing or leading whitespaces. Don't just use a case statement for Hint: you might need to use INSTR(product_name,'-') to find the hyphens. INSTR will help split the column. */ +SELECT +product_name +,CASE WHEN INSTR(product_name,'-') + THEN TRIM(SUBSTR(product_name, INSTR(product_name, '-') + 1)) + ELSE NULL + END as description +FROM product; /* 2. Filter the query to show any product_size value that contain a number with REGEXP. */ +SELECT +product_name +,product_size +FROM product +WHERE product_size REGEXP '[0-9]'; -- UNION @@ -73,7 +109,34 @@ HINT: There are a possibly a few ways to do this query, but if you're struggling 3) Query the second temp table twice, once for the best day, once for the worst day, with a UNION binding them. */ - +SELECT +market_date +,daily_sales +,rank AS [rank] +,'the min' AS [preserve] +FROM ( + SELECT DISTINCT + market_date + ,SUM(quantity * cost_to_customer_per_qty) AS daily_sales + ,RANK() OVER (ORDER BY SUM(quantity * cost_to_customer_per_qty) ASC) AS rank + FROM customer_purchases + GROUP BY market_date +)x +WHERE rank = 1 + +UNION + +SELECT * +,'the max' AS [preserve] +FROM ( + SELECT DISTINCT + market_date + ,SUM(quantity * cost_to_customer_per_qty) AS daily_sales + ,RANK() OVER (ORDER BY SUM(quantity * cost_to_customer_per_qty) DESC) AS rank + FROM customer_purchases + GROUP BY market_date +)x +WHERE rank = 1; /* SECTION 3 */ @@ -89,7 +152,33 @@ Think a bit about the row counts: how many distinct vendors, product names are t How many customers are there (y). Before your final group by you should have the product of those two queries (x*y). */ - +WITH vendor_product AS ( + SELECT DISTINCT + vendor_name + ,product_name + ,original_price + FROM vendor_inventory AS vi + INNER JOIN vendor AS V + ON vi.vendor_id = v.vendor_id + INNER JOIN product AS p + ON vi.product_id = p.product_id +), +big_customer_sales AS ( + SELECT + vendor_name + ,product_name + ,original_price + ,customer_id + FROM vendor_product + CROSS JOIN customer +) +SELECT +vendor_name +,product_name +,SUM(5 * original_price) AS surge_earnings +FROM big_customer_sales +GROUP BY vendor_name, product_name +ORDER BY vendor_name, product_name; -- INSERT /*1. Create a new table "product_units". @@ -97,19 +186,30 @@ This table will contain only products where the `product_qty_type = 'unit'`. It should use all of the columns from the product table, as well as a new column for the `CURRENT_TIMESTAMP`. Name the timestamp column `snapshot_timestamp`. */ +DROP TABLE IF EXISTS product_units; +CREATE TABLE product_units AS +SELECT p.* +FROM product AS p +WHERE product_qty_type = 'unit'; +ALTER TABLE product_units +ADD snapshot_timestamp time; /*2. Using `INSERT`, add a new row to the product_units table (with an updated timestamp). This can be any product you desire (e.g. add another record for Apple Pie). */ - +INSERT INTO product_units +VALUES(10, 'Eggs', '1 dozen', 6, 'unit', CURRENT_TIMESTAMP); -- DELETE /* 1. Delete the older record for the whatever product you added. HINT: If you don't specify a WHERE clause, you are going to have a bad time.*/ - +DELETE FROM product_units +--SELECT * FROM product_units --just for the testing purposes +WHERE product_id = 10 +AND snapshot_timestamp IS NULL; -- UPDATE /* 1.We want to add the current_quantity to the product_units table. @@ -128,6 +228,35 @@ Finally, make sure you have a WHERE statement to update the right row, you'll need to use product_units.product_id to refer to the correct row within the product_units table. When you have all of these components, you can run the update statement. */ +ALTER TABLE product_units +ADD current_quantity INT; - +-- part one, getting the last quantity per product +DROP TABLE IF EXISTS last_quantity_per_product; +CREATE TEMP TABLE last_quantity_per_product AS + SELECT + product_id + ,quantity + FROM ( + SELECT * + ,ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY market_date DESC) AS most_recent_day + FROM vendor_inventory + )x + WHERE most_recent_day =1; --create a temp table with most recent quantities of each product in vendor inventory + +-- part two, left join to add current quantity values to view the nulls +SELECT * +FROM product_units AS pu +LEFT JOIN last_quantity_per_product AS lqpp + ON pu.product_id = lqpp.product_id; + +-- part three, actual update +UPDATE product_units AS pu +-- set current_quantity to most recent quantity or 0 if null +SET current_quantity = COALESCE(( --use coalesce to replace any null values with 0 + SELECT + quantity + FROM last_quantity_per_product AS lqpp + WHERE lqpp.product_id = pu.product_id +), 0); diff --git a/02_activities/assignments/assignment_2_bookstore-prompt1.png b/02_activities/assignments/assignment_2_bookstore-prompt1.png new file mode 100644 index 000000000..512eeaa95 Binary files /dev/null and b/02_activities/assignments/assignment_2_bookstore-prompt1.png differ diff --git a/02_activities/assignments/assignment_2_bookstore-prompt2.png b/02_activities/assignments/assignment_2_bookstore-prompt2.png new file mode 100644 index 000000000..e758b4075 Binary files /dev/null and b/02_activities/assignments/assignment_2_bookstore-prompt2.png differ diff --git a/02_activities/assignments/assignment_2_bookstore-prompt3.png b/02_activities/assignments/assignment_2_bookstore-prompt3.png new file mode 100644 index 000000000..e6b765a09 Binary files /dev/null and b/02_activities/assignments/assignment_2_bookstore-prompt3.png differ