northwind

Exporting Data – The SQL Way

Pre-requisites: Microsoft SQL Server, some knowledge of SQL, and a sample database, Northwind. You can find Northwind sample databses from this webpage https://github.com/microsoft/sql-server-samples/tree/master/samples/databases/northwind-pubs

Criteria

From time to time, you may get a request for an export of customer’s data from clients. A few things to consider when you plan to do an export.

  • What do they need the export for?
  • What file format do they need the export to be saved to?
  • What kind of data does the client need? How should the headers be ordered? Does it even need a header?
  • Do we need to remove duplicate data sets?
  • If the data would contain any dates, how does the client wants those dates formatted?
  • Check if the data is protected under GDPR. Has the customer (not client) requested for their data to not to be transferred to third parties? If they have, then their data should not be included in the data export
  • How does the client wishes to recieve the data export? FTP or by email?

If the client plans to import the data into a different system, you may need to format the export for their system, for example, date and time stamps. Their system may only accept CSV/Excel files with or without headers.

The client may want us to remove any data that are duplicates or flagged as “Do Not Transfer”. Check with the client on the above criteria.

Suppose the client needs an export from Northwind database, the export must contain a summary of order, customer, employee and shipping details. Duplicates needs to be removed from the data export. The file has to be in CSV format.

Tip: Save your SQL query to a file. You might be asked at a later date for another copy of the export with new data. Re-using this query will save you time and effort from writing up a new SQL query

SQL Query

USE [Northwind] SELECT ord.OrderID ,ord.CustomerID ,ord.EmployeeID ,ord.OrderDate ,cust.ContactName ,cust.CompanyName ,CONCAT(emp.FirstName, ' ' , emp.LastName) as EmployeeName ,ord.ShipName ,ord.ShipCity FROM Orders as ord LEFT JOIN Customers as cust ON cust.CustomerID = ord.CustomerID LEFT JOIN Employees as emp ON emp.EmployeeID = ord.EmployeeID ORDER BY ShipCity
Code language: SQL (Structured Query Language) (sql)

This is how the data export would look like in SQL

OrderIDCustomerIDEmployeeIDOrderDateMonthNameYearContactNameCompanyNameEmployeeNameShipNameShipCity
10363DRACD426/11/1996November1996Sven OttliebDrachenblut DelikatessenMargaret PeacockDrachenblut DelikatessenAachen
10391DRACD323/12/1996December1996Sven OttliebDrachenblut DelikatessenJanet LeverlingDrachenblut DelikatessenAachen
10797DRACD725/12/1997December1997Sven OttliebDrachenblut DelikatessenRobert KingDrachenblut DelikatessenAachen
Data table

Now that we have data, we can copy and paste the SQL results into a spreadsheet. Click on the top left uppermost of the results (the white corner). This will select all of the cells.

SQL result showing customer data and shipping details

Then right click and select “Copy With Headers”

It shows submenu listing options to Copy, Copy With Headers, Select All, Save results as. The option Copy with Headers is highlighted.

Export

Create a new spreadsheet in Excel or with any other alternative application. Paste in your SQL data results. Watch out for any special characters especially any commas inside any of the text. Any commas in your data results can mess up your CSV data export by offsetting the columns.

Save your file as an Excel file for now e.g. export.xslx.

Spreadsheet showing customer data

We need to format the OrderDate. Right now, the values under OrderDate are showing as 00:00:00, no dates. Click on the column, OrderDate, to highlight all cells.

Go to the Home tab and then look for the Number format in the Number section.

Scroll down and select “Short Date”.

Your data is displaying the date format.

The next step would be to remove duplicates. Go to the tab, Data and select Remove Duplicates.

You will get a prompt asking you which columns do you want to check for any duplicates. You can tick the “My data has row headers” to see your row headers. You can select all or a few columns. Click OK to proceed.

You will get a message to confirm if any duplicates were removed. I didn’t get any this time because I had selected all columns. If you re-run the same process and then select fewer columns this time. It depends on what the client is looking for. For example, select CustomerID and EmployeeID, you will find that duplicates have been removed. In fact, half of your data may be deleted.

Removed duplicates by CustomerID and EmployeeID

Save your spreadsheet. You can export the data as a CSV file by going to File and then Export.

Choose to export as CSV (Comma Delimited) and click on Save As. Set your filename.

Double check that your data export looks fine. You can open the new file through Excel or Note, check that the columns are intact and no special characters are messing up the data integrity.

Passing on Data

If you are going to send on data to the client, zip the file with a password. The password must be relayed to the client in a separate communication. Don’t send it along with the zip file in the same communication.

SQL – Concatenate Names and get totals

In this short blog post, I’ll show you how to concatenate an employee firstname and last name, then get the total orders made under that employee.

I’m using Northwind database available from Microsoft. It’s a free sample database. You can replicate the same set of results as shown below.

USE [Northwind] SELECT emp.EmployeeID ,CONCAT(emp.FirstName, ' ' , emp.LastName) as EmployeeName ,ord.Total FROM [Employees] as emp LEFT JOIN (SELECT COUNT(*) as Total, EmployeeID FROM [Orders] GROUP BY EmployeeID) as ord ON emp.EmployeeID = ord.EmployeeID ORDER BY emp.LastName
Code language: SQL (Structured Query Language) (sql)

In the first column, it shows the EmployeeID.
In the second column, the CONCAT function wraps the firstname, a whitespace and a lastname, making up the fullname of the employee.
In the third column, it lists the orders total. This results comes from the LEFT JOIN query further in the SQL.

The LEFT JOIN shows it counts all orders and that column is named as Total, it then lists the EmployeeID. It’s grouped by EmployeeID. The LEFT JOIN query is wrapped and named as ord. It’s joined up with the EmployeeID from the Employees table.

The whole dataset is sorted by LastName. You can see the results below.

EmployeeIDEmployeeNameTotal
5Steven Buchanan42
8Laura Callahan104
1Nancy Davolio123
9Anne Dodsworth43
2Andrew Fuller96
7Robert King72
3Janet Leverling127
4Margaret Peacock156
6Michael Suyama67