Skip to main content Skip to navigation
Mauricio Featherman Mauricio Featherman Ph.D

Featherman’s T-SQL and Analytics Adventures – From 0 to Hero

This page holds content that together form Featherman’s T-SQL Adventures, and Featherman’s Analytics Adventures. This content is designed to use SQL Server Management Studio (SSMS) to transfer considerable T-SQL skills and perspective to you. PowerBI and Tableau projects are also used to gain some of those skills and to explain the SQL. Microsoft’s Reportbuilder and PowerBI are used to demonstrate the integration of SQL stored procedures, and report development, and how to automate the reporting and dashboard refresh process. The content is worth your time to learn because in your career you may have to gather and organize data on a regular basis. Compiled, good data is needed to make better informed business decisions. Currently many business functions cannot rely on an IT department to provide datasets or reports, so they have to do the job themselves. This content demonstrates how to automate the report generation process so that the analyst or manager can spend their time solving problems rather than manually integrating, organizing and presenting data using copy/paste inside Excel.

The 10 modules below are useful to develop skills and perspective needed by DBAs and business analysts. The content is used to perform data management, data integration, data analysis and data visualization. While not posted here the course assignments give you a business problem, dataset and ask you to analyze the problem and provide suggestions. The modules below include .docx documents, and video to enable you to quickly gain T-SQL skills. The training .docx files that you can freely download, are all similar, the left-hand column contains the SQL code that you can copy and run as a SSMS query. The documents are interactive then as the queries retrieve data into an in-memory dataset.  The right-hand column contains a picture of the output you should receive when you type (or copy) and run the SQL query. The right-hand column also contains the conceptual explanation of the processing, and other usage information. As such the documents together are a textbook. In many cases pivot charts, maps and other data visualizations are displayed to give the idea of the usefulness of the SQL procedure. The documents are evolving and being updated so please share your feedback on areas that could use more clarification and revision.

Supplemental learning materials include Featherman’s videos, and industry reference documents. Learning how to manage and shape data is important and fun. You can even experience the exhilaration satori feeling of mastery over data.

To perform the learning:

If you are a WSU student, watch the first video below to connect to the cb-ot-devst05.ad.wsu.edu database mentioned in the T-SQL primer document. After you watch the first video

Setup: Download a digital copy of the .docx document in the first module. It may help learning if you print a  copy and use colored pens to categorize the learning content, and take personal notes to explain what is going on.

Introductory understanding: Open the .docx file and copy/paste the queries from the first column of the MS-Word document into a new SSMS query and press the red exclamation point icon to execute the query and see the interactive results. Read the explanation of the right hand column so you can make sense of the content and master it quickly. Run each query, build your knowledge and experiment.

Master level comprehension: Go back to the beginning of the module and type in each query, making modifications to the dimensions and metrics that interest you. One dataset can tell so many different stories. Modify the queries to play, build skills and gain a repertoire of data management routines.

While the course is focused on the SQL-way of performing data tabulations, at times you will use Excel and Tableau to make maps, pivots, area charts or combo charts. T-SQL is the lingua franca – language of data. The course focuses on SQL SELECT statement used to compile datasets for analytics investigation.

Tip:

Do not go to the next module until you can explain the current module to mom or dad or some other person. Very often students fail on a technical job interview because while they can hack out code, they can’t really explain the functionality in a conceptual manner. The result is that they can’t explain what they have learned.

If you can explain a module that you learned by speaking in conceptual sentences (no code), then you probably understand the content. Many students fail technical interviews because they can’t really explain what they have been studying. So perhaps get in the habit of writing summary statements and communicating them, and keep refining your understanding and communication of the concepts, until you can easily make someone else understand the concepts and usefulness of your code. Many a student have failed a technical interview because they kind of knew what they were talking about, but they could not communicate it.

Ok lets get started!

Glossary – here are some materials to help decipher some of the other content and put some context to the course material
1. BI Glossary – 2019
2. More Glossary
3. ETL discussion

Software and music
1. SSMS – Please install SSMS so that you can run the queries from Microsoft here is the link
2. VPN – When connecting to a WSU database you need to first open up a VPN connection and authenticate yourself. Click here to install the VPN software.

Recommended music for coding sessions. 

Relaxing Zen Music 
Japanese Zen Garden Music

The most important advice of the semester


Module 1 – Introduction to SQL – this introductory module introduces you to writing and running SQL queries in SSMS. You retrieve and create datasets of transaction data (not summarized data). You learn how to filter the data retrieved and how to create columns of derived, calculated measures (aka new columns of metrics or analytics). You also learn how to JOIN tables together so that you can include columns from different tables into your resultset. Typically you need to join together 3 -5 tables so learning this process early in the course is critical. Be sure you conceptually understand joins at a very deep level. Joins are a standard whiteboard interview question.

  1. Documentation  – Featherman-TSQL-Primer-8-26-20
  2. Videos – There is a video series here that will assist you. Watch videos 1-4
  3. IMPORTANT: You can learn a great deal about coding by watching the videos and reading the .docx documents. While useful, this conceptual understanding does not mean you know how to program. To learn how to program you just have to type and click the mouse until you can create functionality. The way to learn for every module is to split your computer screen (or plug your notebook computer into a TV), or use dual monitors. Start up SSMS on one screen and play the video on the second screen. Pause the video every few minutes and re-create the same program. This is a form of guided learning. Next run each of the queries in the .docx files and investigate the results, reading the .docx file for explanations. Only by building a coding foundation by practice can you learn the concepts. When you understand the concepts, then you can start the assignment. If you skip the learning by typing approach, and go from watching videos to the assignments, then the frustration created was your personal choice.

In the vdo below when asked for a password, be sure to click on the database tab first.
Then use userID mfstudent, password BIanalyst
(letter B letter I not B1) BI is the acronym for Business Intelligence

Here is a video designed to get you started learning using Featherman’s TSQL Primer

This next video shows how to join tables so that you can retrieve columns of data from two or more tables. When building datasets and reports. More in-depth joins are in the next module.

This next video further explains the granularity (level of detail) of the rows of data in datasets. We also build data hierarchies.

  1. Other Training Materials
    a) People like this website to learn simple SQL commands
    b) Play the SQL matchup game


Module 2 – Summarizing data, integrating datasets, adding time/date columns – this module introduces the first data condensing process, aggregating transaction data into summarized datasets. Transaction data is aggregated into high-level datasets based on one or more dimensions such as city/state. You build many columns of metrics in the process. This module further covers integrating tables of measures (called fact tables) to tables of categorical, textual data (called dimension tables). We practice JOIN operations. Finally this module introduces date/time analytics where you derive new date/time dimensions from a date column. For example you can add new columns of data that report the day of week# and name, day of month #, week#, month# and month name. This content forms the basis of managerial reporting. A strength to this approach is that you can build a dataset column by column, adding many columns of metrics for each dimension specified.

Here is Featherman’s suggestion for building columns of analytics the FU VW C PRR approach (to remember it think VW cheated on their carbon emissions tests for their diesel cars and should change to electricity that makes the car purr). Please suggest more ways to analyze business performance.

You can use these and other analytical approaches when you perform research. Can you guess which category of metrics marketers, operations types, and financial types focus on?

  • Frequency – how often do the transactions occur, how often does the customer transact, or the product get produced? Marketers often use the 80/20 rule to focus on their frequent flyer customers.
  • Units – how many units are being produced, sold and shipped? Operations/production/shipping/supply chain professionals focus on unit sales by SKU or by product category.
  • Velocity – How fast are transactions occurring? 5 per quarter vs 5 per month vs 5 per week? Is the velocity changing (speeding up or slowing down?)
  • Width – (aka breadth) How many different items are on one invoice, or how many different products are being produced in one period? This is not analyzing the composition of the market basket, but more superficially, an analysis of the number of line items, different customer segments purchase on average.
  • Change – how does this period compare to prior periods (aka turnover analysis). Examples are month over month (MOM) analysis, year over year (YOY) analysis to see change. Useful in retail settings to examine racks, end caps, shelves.
  • Profitability – how much profit after cost of goods sold? Today’s profit is tomorrow’s payroll.
  • Recency – analyze the dimension using the ‘what have you done for me lately’ mindset. This perspective can investigate top 10 customers, machines, employees, product lines, etc for the past month or past six months… this top 10 can be quite different from the historical top 10. You can look for positive signs of performance, negative indicators, and changes in trends.
  • Revenue – the value of the transactions (careful revenue is not profit.)

Documentation

  1. Featherman-TSQL-GROUP-BY-Queries-12-22-21GROUP BY Queries, INNER JOIN, DATEPART() functions and Excel Pivot Chart Reporting
  2. 3 table joins with GROUP BY – this document shows a few ways the fact tables are connected to the dimension tables in AdventureWorksDW2014.

Videos – There is a video series here that will assist you. Watch videos 5,6,9,10

The next video demonstrates a lot of GROUP BY functionality and calculated columns.

PowerBI Intro

The next video demonstrates the PowerBI software and uses a NO SQL approach to producing the same reports as in a prior video. This comparison to PowerBI and Excel is provided to aid your understanding of SQL database data modeling. Here PowerBI, Excel, and Power Pivot inside Excel are used to build the 3 table data model (geography, resellers, sales) needed for reporting. The calculated total sales for Lavendar Bay is confirmed across corporate standard reporting technologies, SSMS, PowerBI, and Excel. A variety of data visualizations are demonstrated; pivot tables, pivot charts, matrix reports, Bing maps, and column charts. Each calculates the totals similarly providing the same results. Taking a quick look at pivot and matrix tables also paves the way for the next module, which is designed to get you the budding analyst to think tabular.

 

This next video demonstrates creating calculated columns, and then demonstrating the ease of building many different datasets, by only changing the GROUP BY dimension.

This next video demonstrates how to use DATEPART(), DATENAME() and PowerBI’s date/time intelligence features, to create new columns of attributes that are useful to create line charts and area charts to compare data over time.

DatePart DateName document

 

This last video focuses on using DATEPART(), YEAR(), MONTH. etc., as dimensions that can group the data by a time period. The connection between DATEPART() and line or area charts is made. DATEDIFF() is also demonstrated. This video completes module 2


Module 3 – Cross-tabulating data – this module introduces pivot tables (In Excel) , and cross-tab datasets in (SSMS). Managers love tables of values to slice a measure by two criteria forming a tabular, Excel like data output. For example sales by city are sliced into months, or years. Or sales are sliced into counties and cities within them. While GROUP BY() queries are superior in that you can crate many columns of metrics per dimension, and cross-tabs (best practices say) you can only have one measure per crosstab; PIVOT() tables are also very powerful because you can condense huge amounts of data into a compact table at any level of granularity needed. Both data compaction tools are heavily used and sometimes in conjunction, for example the first data pass using a PIVOT() then a second scan of the data using further GROUP BY() condensing. The table output from Pivot() is used for data visualizations, most commonly line & area charts to show trends over time.

Documentation

  1. GROUP-BY-vs.-PIVOT-queries
  2. Pivot-Transformations 9-10-21


Video Training – intro to PIVOT() queries


This next video shows how to perform the prior analysis using an Excel pivot table

Creating a pivot Chart in Excel using data stored in Powerpivot (which is a Data Warehouse inside Excel)

Building the list of columns using Dynamic SQL 

Documentation – Dynamic SQL – this is the base and improved query from the video

 

 


Module 4 – Categorizing data records – The analyst needs to solve problems, but often there is too much data, so they cannot see the problem or hints to the solution.  This module shows how to categorize data, so that it can more intelligently be filtered. You are shown how to discretize data into either equal-sized bins (such as quartiles) using NTILE(). This functionality is often used in conjunction with RANK() which is used to rank an attribute (such as salesrep) based on some measure(s). Case processing is used to also add a new column of derived data that has a textual term in it such as ‘top customer’, ’emerging customer’, etc.). The NTILE() or CASE() generated term is then used in slicers and drop-down lists of parameters to categorize then filter data more easily, and allow greater focus on segments of data. DBA’s and analysts use these tools daily.

This module shows four usages of CASE () statements, if you know of other USE cases then please contact featherman@wsu.edu

1. To categorize data into groups (i.e., clusters of similar data based on some categorical attribute – based on values of a dimension or measure). WHen you put data into groups then you can compare the groups (Intergroup analysis akin to ANOVA) and you can also dig deeper into the group (within groups analysis) to examine the patterns of that group (e.g. all juniors at a local university). The category created makes a good slicer on a report, useful to segment data to make sense of it. Different groups act differently, and you also often need to put similar data points into a group so you can examine the phenomenon

2. To stop a SELECT query from crashing throwing errors such as the dreaded divide by zero error.

3. When you put records into groups, you can then change the granularity of the analysis. The example given was to put US states into regions using a CASE statement then examine the measures by region

4. CASE processing allows you to make more elaborate calculations in your SELECT statements. The example given was to use different calculations to calculate the value for a new column, based on country. The CASE statement is used to do different calculations based on different values of some dimension.

Documentation

  1. Categorizing-data-with-CASE-9-24-21
  2. Removing-Rookie-Errors-9-27-21– This sample analysis using CASE() processing is used to highlight some errors that rookie analysts make. In summary look at data at different levels of granularity so that you can make sense of it, and don’t jump to conclusions, and try not accept the first interpretation of the data that you generate.
  3. An interesting CASE example using local variables – This content was student inspired and provides both an interesting example, coverage of new material (local variables), and a peek into future modules.
  4. a) SQL query to categorize states into regions – CASEStatebyRegion
    b) Sample Excel charts – RegionsStatesByMonth

Videos – There is a video series here that will assist you. Watch video 5
There is a video series here that can help you. Watch video 6

CASE example and usage of local variables


Module 5 – Sub-queries – There are some thorny query problems that can be fixed by breaking one query into two or three pieces.If you run two queries,you can feed the results from one query into the next. For example if you want a list of machines that have scrap rates higher than average, first you need to calculate and store the average. You can also create new columns of metrics and add them to your results, even bringing in data from a second fact table. Sub-queries can be used to bring data together about an attribute from different transaction data fact tables. 

Documentation

  1. SubQueries 10-5-21
  2. More on SubQueries 10-6-21 – A peek in PowerBI’s DAX formulas is provided.
  3. SubQueries for Accountants 10-5-21
  4. Taking-a-peek-ahead 3-10-21 – this document uses variables in two new two ways to foreshadow future modules.
  5. Videos – There is a video series here that will assist you. Watch videos 11, 12
  6. More Subquery magic! Explore the dark secrets of data explosion, and the promise of subqueries to rid the world of this strife!


Module 6 – Using Arrays to Pull Data Together and some DBA Data Management Techniques – sometimes data just doesn’t want to join! This can happen when the data is at different levels of granularity. For example, you can have a table of already summarized retail store totals, and you want to add different columns of data summarized by different levels of a dimension (city, state, region). Data in a join query needs to be at the same level of granularity so this simple request can be impossible unless you can add columns one at a time. Arrays (aka array tables, aka SQL table variable) let you build columns of data with few restrictions. You can then save that compiled array to an empty SQL Server table and then report the data. You could also have a reporting tool run the query and consume the returned dataset, and produce updated reports and dashboards.

Documentation
a) Featherman-TSQL-Analytics-Table-Variables-10-14-21-This document shows how to create and leverage arrays in SQL Arrays are integral to data management. The versatility of arrays is demonstrated with an in-depth analysis of different techniques that can be leveraged.

b) Featherman-TSQL-Analytics-Table-Variables-Part-2-10-18-21This second arrays document provides further examples.

b) More-on-Arrays-10-20-21– This document provides the code for video #3 below. The document focuses on the INSERT INTO and UPDATE SET commands which are useful to load arrays with data and produce interesting columns of analytics (which often produce increased insight into profits and performance).

c) Motivation-for-Window-Functions 10-22-21
This document does not provide new content, but rather has queries that demonstrate simple array coding. The code is obviously cumbersome and needs improvement (which motivates our need to go further and learn Window Functions)

Videos

These videos are demonstrations of each query in the first doc file above. Even if they make sense while you watch  the video, it will be useful to read the document as well as more insights are provided. The only way to learn the content and be comfortable writing this type of code for an exam and career, is to copy each query from the doc file,  run them in SSMS and then alter the queries to experiment. Ask any DBA, arrays are a core data management technology.

  1. Part1

     

    Part 2

     

    d. There is a video series here that can help you. Watch video 3

  2. Other Training Materials
    a) suggested study music


Module 7 – The Data Management Case for Stored Procedures and Analytics Paradigms

Stored procedures are used to automate your data management and analytics processes performed in regular report generation and ETL data management

Typically SQL is used to pull together the dataset (or refresh the database tables) prior to creating data visualizations such as reports and dashboards. These SQL queries can grow to 100’s of lines of code, so the query should be ‘built once – used often). SQL stored procedures are the object most commonly used to save and execute SQL queries. Analysts like to use excel pivot charts so you are shown how to call (execute) a stored procedure in Excel. Ditto for ASP.NET webpages, SSRS reports (ReportBuilder now), PowerBI and Tableau visualizations.


Documentation

  1. Turning-your-Query-into-a-Stored-Procedure – here you turn your queries into corporate assets
  2. Using-Stored-Procedures-with-OUTPUT-variables 10-29 – this file demonstrates how to utilize stored procedures that return calculated results to local variables.
  3. The-Business-Case-for-Corporate-Use-of-Stored-Procedures 4-4-21
  4. Different-Analytics-Paradigms
  5. ETL-discussion-11-20-20

Optional Development Projects

  1. Calling a stored procedure from reportbuilder – Reports can supply values to parameters then call embedded stored procedures
  2. Using VBA and ADO in Excel to call a parameterized T-SQL stored procedure – this is a complement to the Management Science 470 course which uses VBscripting inside Excel.
  3. Running-a-Stored-Procedure-from-Visual-Studio – this ASP.NET webpage uses ADO.NET objects to run SQL stored procedures using data adapters.Videos – There is a video series here that can help you. Watch videos 1,2

    Creating a SQL Stored Procedure

    Creating a SQL Stored Procedure and calling it from ReportBuilder


Module 8 – Window/Table Functions – This module generates very useful and popular analytics to generate running totals (such as month to date) and comparing two time periods, such as comparing monthly sales to the prior month or the same month of the prior year(s). The functionality used here complement the MIS 441 learning of PowerBI and Excel’s DAX programming time intelligence functions objects such as TOTALMTD, TOTALYTD, SAMEPERIODLASTYEAR, etc.

  1. Documentation  – Window-functions-11-11-21
  2. Videos – There is a video series here that can help you. Watch videos 8,9
  3. Video on SQL Window Functions
  4. Displaying Window Functions using PowerBI
    a) Charting Window Functions
    b) video
  5. Tableau Table Functions are very similar to TSQL Window Functions

Module 9 – DAX and Reporting data

This module was designed with two purposes First, a metrics building approach designed for the emerging finance/accounting analyst who will use Excel and PowerBI everyday. Some innovative shortcuts are provided by the new DAX expression builder. The remainder of the module provides a pause from the SQL programming and further explores MSFT Reportbuilder (SSRS), PowerBI and Tableau.

Documentation
1. Using Dax To Create Calculated Columns 12-2-20
2. DAX-date-functions-12-4-20– functionality for month to date, and month over month analysis. For this second example, we need a database that has a Date table.

You can implement the DAX Date functions in the following file:
Contoso-Sales-Sample-for-Power-BI-Desktop

3. One more DAX Adventure: Revenue Analytics
Revenue Analytics Exercise #1

Data Vizualizations
Lets put more focus on visualizing data. This section includes videos for ReportBuilder (the new name for SSRS), Tableau, and PowerBI. While you can crate calculated fields in each reporting software, there are many reasons why its optimal to place the calculations and JOINS, etc., inside the SQL query that is saved as a stored procedure and called from the reporting (aka data visualization) software. if a work team or department can all agree that the calculations go into SQL stored procedures that are reusable, then the reports should be based on the same version of truth. While each employee can use a stored procedure to retrieve data, they can also add value to the report by adding more calculations. If the added value is useful, it should be verified, institutionalized, and automated.

  1. Documentation – Scatterplots
  2. Use these SQL queries inside PowerBI to vizualize scatterplots – SQL-for-ScatterGrams 4-10-20
  3. SQL and instructions to build PowerBI – Tachometer chart
  4. Here is a nice charting best practices document
  5. Videos
    a) Report Builder Intro

    b) Report Builder: Creating the Drill-down Report

    c) Report Builder: Cross-tab report with Area Chart

    d) Report Builder: Parameterized Reports

    e) Tableau Into

    f) Tableau Date Charts

    g) Tableau Table Calcs – watch this video to understand windows of data and SQL window functions.

    h) Tableau Difference Charts

    i) PowerBI maps, charts, and date/time analysis – Police data is examined

    j) PowerBI Scattergrams – when you want to examine relationships between two or three measures, perhaps over time then you can use the scattergram with a play axis for time. Relationships between business performance indicators can provide insight into the relationship between factors. Many data relationships are studied in the video.

     

  6.  Power BI Tachometer control – When you want to compare actual business performance as compared to forecast you can use a tachometer.  Many KPI’s are percentages, and this control let’s you see insights in the data quickly.
  7. Power BI WordCloud report – Text analytics is an attempt to make sense about what people are together saying about a produce, event, etc. The technology is simple, just count the number of times the word is in the column and the higher the count the bigger the word is in the visualization.Wordcloud can be used to assess consumer sentiment about a topic. Textual responses allow quicker access to emotional responses which are short-cuts to the customer insights. The information gleaned from words is quite interesting complement to that generated from numbers!

    Module 10 – Special topics – Advanced data shaping techniques

    1. Dynamic SQL 11-30-21 – This document introduces an advanced way to produce dynamic lists for a SQL IN() statement of a PIVOT or GROUP BY query.
    2. Moving Data into New Columns – this document records the methodology and learning that were the result of an MBA@WSU project with Pullman Regional Hospital that was made possible by Ruben Mayes, the greatest coug running back in WSU history.
    3. Using a two-step data compiling process – TSQL-Analytics-Loops-and-CASE-ETL-Methods-3-24-20. If you can read and understand this document you are ready for a DBA position.
    4. Tableau’s Level of Detail (LOD) – 15 use cases – The first use case of the Tableu content linked here, shows an alternate way to doing the two-step data prep demonstrated in the above example. The Tableau programming functionality (such as LOD) is super powerful and concise. It is worth your time to investigate.  LOD content provided here and here.