azure platforms and services complete series: April 2016

Wednesday, April 20, 2016

MS SQL Azure with Power BI

Connecting through Power BI

Connect to the Azure SQL Database connector for Power BI.

Select Get Data at the bottom of the left navigation pane.
Within Databases, select Get.
Select Azure SQL Database > Connect.
Enter the necessary information to connect. The Finding Parameters section below shows where this data can be located in your Azure Portal.

Note: The username will be a user that is defined in your Azure SQL Database instance.
After you've connected, you see a new dataset with the same name as the database you connected to, and a placeholder tile for the dataset, created on the dashboard.
Selecting the placeholder tile for the dataset opens a new report, where you can explore all of the tables and columns in your database. Selecting a column will send a query back to the source, dynamically creating your visual. These visuals can be saved in a new report, and pinned back to your dashboard.

Finding Parameter Values

Your fully qualified server name and database name can be found in the Azure Portal.

Wednesday, April 6, 2016

Understanding Azure Data Lakes and Analytics

The data lake serves as an alternative to multiple information silos typical of enterprise environments. The data lake does not care where the data came from or how it was used. It is indifferent to data quality or integrity. It is concerned only with providing a common repository from which to perform in-depth analytics. Only then is any sort of structure imposed upon the data.

As the popularity of the data lake grows, so too does the number of vendors jumping into data lake waters, each bringing its own idea of what a data lake entails. While any data lake solution will have at its core a massive repository, some vendors also roll in an analytics component or two, which is exactly what Microsoft is planning to do. As the following figure shows, the Azure Data Lake platform comprises three primary services: Data Lake Store, Data Lake Analytics, and Azure HDInsight.

2314-12ca96a7-cbf3-4404-a3a6-97df90c125c

Data Lake Store provides the repository necessary to persist the influx of data, and Data Lake Analytics offers a mechanism for picking apart that data. Both components are now in public preview. Microsoft has also rolled HDInsight into the Data Lake mix, a service that offers a wide range of Hadoop-based tools for additional analytic capabilities. To facilitate access between the storage and analytic layers, the Data Lake platform leverages Apache YARN (Yet Another Resource Negotiator) and WebHDFS-compatible REST APIs.

Azure Data Lake Store

Microsoft describes Data Lake Store as a “hyper-scale repository for big data analytic workloads,” a mouthful, to be sure, but descriptive nonetheless. The service will let you store data of any size, type, or ingestion speed, whether originating from social networks, relational databases, web-based applications, line-of-business (LOB) applications, mobile and desktop devices, or a variety of other sources. The repository provides unlimited storage without restricting file sizes or data volumes. An individual file can be petabytes in size, with no limit on how long you keep it there.

Data Lake Store uses a Hadoop file system to support compatibility with the Hadoop Distributed File System (HDFS), making the data store accessible to a wide range of Hadoop-friendly tools and services. Data Lake Store is already integrated with Data Lake Analytics and HDInsight, as well as Azure Data Factory; however, Microsoft also plans eventual integration with services such as Microsoft’s Revolution-R Enterprise, distributions from Hortonworks, Cloudera, and MapR, and Hadoop projects such as Spark, Storm, and HBase.

To protect the data, Microsoft makes redundant copies for ensuring durability and promises enterprise-grade security, based on Azure Active Directory (AAD). The AAD service manages identity and access for all stored data, providing multifactor authentication, role-based access control, conditional access, and numerous other features.

To use AAD to protect data, you must first create AAD security groups to facilitate role-based access control in Azure Portal. Next, you must assign the security groups to the Data Lake Store account, which control access to the repository for portal and management operations. The next step is to assign the security group to the access control lists (ACLs) associated with the repository’s file system. Currently, you can assign access control permissions only at the repository level, but Microsoft is planning to add folder- and file-level controls in a future release.

Data Lake Store supports POSIX-style permissions exposed through the WebHDFS-compatible REST APIs. The WebHDFS protocol makes it possible to support all HDFS operations, not only read and write, but also such operations as accessing block locations and configuring replication factors. In addition, WebHDFS can use the full bandwidth of the Hadoop cluster for streaming data.

Data Lake Store also implements a new file system-AzureDataLakeFilesystem (adl://)-for directly accessing the repository. Applications and services capable of using the file system can realize additional flexibility and performance gains over WebHDFS. Systems not compatible with the new file system can continue to use the WebHDFS-compatible APIs.

Azure Data Lake Analytics

Data Lake Analytics represents one of Microsoft’s newest cloud offerings, appearing on the scene only within the last couple months. According to Microsoft, the company built Data Lake Analytics from the ground up with scalability and performance in mind. The service provides a distributed infrastructure that can dynamically allocate or de-allocate resources so customers pay for only the services they use.

As with similar cloud platforms, Data Lake Analytics users can focus on the business logic, rather than on the logistics of how to implement systems and process large data sets. The service handles all the complex management tasks so customers can develop and execute their solutions without worrying about deploying or maintaining the infrastructure to support them.

Data Lake Analytics is also integrated with AAD, making it easier to manage users and permissions. It is also integrated with Visual Studio, providing developers with a familiar environment for creating analytic solutions.

A solution built for the Data Lake Analytics service is made up of one or more jobs that define the business logic. A job can reference data within Data Lake Store or Azure Blob storage, impose a structure on that data, and process the data in various ways. When a job is submitted Data Lake Analytics, the service will access the source data, carry out the defined operations, and output the results to Data Lake Store or Blob storage.

Azure Data Lake provides several options for submitting jobs to Data Lake Analytics:

Use Azure Data Lake Tools in Visual Studio to submit jobs directly.
Use the Azure Portal to submit jobs via the Data Lake Analytics account.
Use the Data Lake SDK job submission API to submit jobs programmatically.
Use the job submission command available through the Azure PowerShell extensions to submit jobs programmatically.

The U-SQL difference

A job is actually a U-SQL script that instructs the service on how to process the data. U-SQL is a new language that Microsoft developed for writing scalable, distributed queries that analyze data. An important function of the language is its ability to process unstructured data by applying schema on read logic, which imposes a structure on the data as you retrieve it from its source. The U-SQL language also lets you insert custom logic and user-defined functions into your scripts, as well as provides fine-grained control over how to run a job at scale.

U-SQL evolved out of Microsoft’s internal big data language SCOPE (Structured Computations Optimized for Parallel Execution), a SQL-like language that supports set-oriented record and column manipulation. U-SQL is a type of hybrid language that combines the declarative capabilities of SQL with the extensibility and programmability of C#. The language also incorporates big data processing concepts such as custom reducers and processors, as well as schema on reads.

Not surprisingly, U-SQL has its own peculiarities. Keywords such as SELECT must be all uppercase, and the expression language within clauses such as SELECT and WHERE use C# syntax. For example, a WHERE clause equality operator would take two equal signs, and a string value would be enclosed in double quotes, as in WHERE Veggie == "tiger nut" .

In addition, the U-SQL type system is based on C# types, providing tight integration with the C# language. You can use any C# type in a U-SQL expression. However, you can use only a subset of C# types to define rowset columns or certain other schema objects. The usable types are referred to as built-in U-SQL types and can be classified as simple built-in types or complex built-in types . The simple built-in types include your basic numeric, string, and temporal types, along with a few others, and the complex ones include map and array types.

You can also use C# to extend your U-SQL expressions. For example, you can add inline C# expressions to your script, which can be handy if you have a small set of C# methods you want to use to process scalar values. In addition, you can write user-defined functions, aggregators, and operators in C# assemblies, load the assemblies into the U-SQL metadata catalog, and then reference the assemblies within your U-SQL scripts.

Data Lake Analytics executes a U-SQL job as a batch script, with data retrieved in a rowset format. If the source data are files, U-SQL schematizes the data on extract. However, the source data can also be U-SQL tables or tables in other data sources, such as Azure SQL Database, in which case it does not need to be schematized. In addition, you can define a U-SQL job to transform the data before storing it in a file or U-SQL table. The language also supports data definition statements such as CREATE TABLE so you can define metadata artifacts.

Working with SQL Azure Datawarehouse

Add a SQL Data Warehouse database

When you create a SQL Data Warehouse database, you must specify a server and resource group. The server is a logical Azure server that resides in a specific geographical region and serves as a host for the databases you create through either SQL Database or SQL Data Warehouse. The resource group is a logical container for managing a collection of Azure resources.

You can create the server and resource group when you define your SQL Data Warehouse database, or you can use ones that already exist.

In addition to designating a server and resource group, you must also assign the initial number of data warehouse units (DWUs) that your database will use. The DWUs provide a measure of the computational resources (CPUs, memory, and storage I/O) available to your database, aggregated across participating nodes.

You can scale the number of DWUs assigned to your database up or down at any time. The more DWUs you assign, the better the performance but the higher the costs. The preview rate for the compute resources currently starts at 70 cents per hour for 100 DWUs, which comes to about US $521 per month, if you keep your database running full time.

With these variables in mind, let’s look at how to create a SQL Data Warehouse database. Your first step is to log into the Azure portal and then click New. This launches the New pane, which is populated with the various categories of Azure services. Click Data +Storage and then, in the Data + Storage pane, click SQL Data Warehouse, as shown in the following figure.

After you click SQL Data Warehouse, you’re presented with the SQL Data Warehouse pane, where you provide a name, set the initial DWUs, and specify the server, database source, and resource group. The pane looks similar to the one shown in the following figure, before configuring any of the settings.

Notice that the initial DWU setting is 400. If you’re just here to try things out, move the slider down to 100 DWUs, the lowest setting available. I like to take care of this setting before all others to make sure I don’t forget. No sense spending any more than necessary just to test the waters.

This is also a good time to provide a name for your database. For this article, I used SdwDB1, which is reflected in many of the screenshots, but use whatever works for you.

Once you’ve gotten the name and DWU settings out of the way, you can specify your server. Click Server in the SQL Data Warehousepane to launch the Server pane. Here you can either select an existing server or create one. If you want to create a server, click theCreate a new server option, which launches the New server pane, shown in the following figure.

In the New server pane, type a name for the server, a name for the server’s administer account, and a password for the account (twice). You should also choose the server location, based your geographical region. For this article, I named the server sdwsrv1 and the administrator account sdwadmin. I chose West US as my location.

Notice you also have the option Create V12 server, which refers to the latest SQL Database server version. This is because the SQL Database service underlies SQL Data Warehouse. When you create a SQL Data Warehouse database, you’re actually creating one of the SQL Database variety, with some extra bells and whistles to support data warehousing.

Currently, the V12 server is the only version you can choose for a SQL Data Warehouse database. For that reason, the option is enabled by default and you cannot change it.

Another option you cannot change is Allow azure services to access server. The checkbox is selected by default and is a must for a SQL Data Warehouse server.

That’s all there is to creating a server. Just be sure to click OK after you’ve filled in the blanks. From there, you can select the database source by choosing one of the following three options:

Blank database: Creates a new database that contains no user tables or data.
Sample: Creates a database based on a sample database.
Backup: Recovers a database from a geo-redundant backup.

The following figure shows the Select source pane with the Sample option selected, which is what I’ve used for the examples in this article.

When you select Sample as your source type, the SQL Data Warehouse pane adds the Select sample option. Currently, the only sample available is the AdventureWorksDW database, so it is selected by default. I’m assuming that Microsoft has included this option because it will be possible at some point to choose from other sample databases.

After we select our source, we can specify a resource group. When you click the Resource group option in the SQL Data Warehousepane, the Resource group pane appears, where you can choose an existing resource group or create a new one. To create a new one, click the Create a new resource group option. This launches the Create resource group pane, shown in the following figure.

To create the resource group, you need only type in a name and click OK. (I named my resource group SdwGrp1.) After that, you’re ready to create your database. Before you do, however, you might want to select the Pin to dashboard check box so you can easily access your database from the dashboard. Then, if all your settings look satisfactory, click Create to generate your new database.

When the process start, you should receive a notification that keeps you posted on the progress. You can tell when you have a notification because the Notifications button at the top of the portal will include a red alert that indicates the number of messages you have not read. To access your notifications, simply click the button. This also allows you to monitor the progress of your database as it’s being created, as shown in the following figure.

When the process is completed, you should receive a second alert similar to the one shown in the next figure, indicating that the database has been created in the specified resource group, in this case, SdwGrp1.

If you pinned your database to the dashboard, you can easily access it from there. Just click the box that shows your database to open a pane that contains details about the database. For example, the following figure shows part of the SdwDB1 pane for theSdwDB1 database I created in Azure.

From the database’s pane, you can access a variety of information about your database, as well as the server, resource group, and subscription associated with that database. You can also scale the database (raise or lower the DWUs), restore the database, or open the database in Visual Studio or PowerShell.

Of the available options in the database pane, I think the most valuable one is the Pause button at the top. As the name suggests, you can pause the compute services used for the database at anytime, effectively putting the database on hold until you’re ready to use it again. This cancels out running queries and the DWUs assigned to the database, thus avoiding compute fees for unnecessary services. You still pay for the data storage – which preserves your data – but at least this way you get a break on the compute rates. When you consider how many hours a data warehouse can sit idle, the ability to pause a database can result in considerable savings, assuming you remember to take this step.

Working with the server settings

As you saw in the previous figure, the database pane lists the server’s fully qualified name, in this case,sdwsrv1.database.windows.net. This is the name you use when connecting to the server from another service or application, such as Visual Studio. (As mentioned earlier, I’ll be covering database access in my next article on SQL Data Warehouse.)

The server name, as it is listed in the database pane, also serves as a link for accessing the server settings. When you click the link, the server pane appears, along with the Settings pane, which is specific to that server, as shown in the following figure.

From the server pane and Settings pane, you can access a variety of information about the server. If you’ve been using the SQL Database service, you’re probably familiar with many of these settings. If you’re not, you can review these at your leisure to better understand what each one is about. The nice part is that you can pause the database while doing so to avoid extra charges when you’re simply trying to make sense of the features.

One group of settings that you’ll likely need to know about sooner rather than later are the firewall rules. To connect to your database from a service or application, you must add the necessary firewall rules to permit access, based client IP addresses. You can access the firewall settings in one of two ways:

Click the Show firewall settings link in the server pane.
Click the Firewall option in the server’s Settings pane.

Either approach launches the Firewall settings pane, where you can add the IP addresses of those systems that should be able to access your database.

When you’re first setting up your database, you’ll likely want to access it from an application on your local system, such as Visual Studio or Power BI Desktop. By default, the Firewall settings pane lists your system’s client IP address; however, you must still add a firewall rule for that address to enable local access. To do so, simply click Add client IP at the top of the pane. This adds a rule to the pane, as shown in the following figure. In this case, I changed the rule name to ClientIP1, rather than using the cumbersome default name. (The actual IP address has been grayed out.)

You can also add rules that define ranges of IP address to better accommodate multiple systems or addresses that are likely to change. After you create a rule, be sure to click Save to make sure it gets added to the server. It can sometimes take a few minutes for the rule to be applied, so you might have to wait a bit before you can connect to the database.

When you first set up the server and database, as I’ve described so far, you can use the server administrator account to connect to your data warehouse, which has full access to everything on the server. In the next article, I’ll dive into the details about setting up your user accounts and security roles to better restrict access, but for now, just know that these options are available to help you protect your data.

While we’re on the topic of connectively, another option in the database pane worth pointing out is Show database connectionsstrings. Clicking the link launches the Database connection strings pane, shown in the following figure.

As you can see, the pane provides a handy way to get the connection strings necessary to connect from an application or service via ADO.NET, ODBC, PHP, or JDBC. You need only click the button to the right of each one to copy the connection string to your clipboard.

Working with the database settings

From the database pane, you can also access a variety of settings associated with that database. To get to these settings, clickSettings at the top of the database pane. This launches the Settings pane associated with the database, which is shown in the following figure. (You can also launch the Settings pane by clicking the All settings link in the Essentials section near the top of the database pane.)

The Settings pane includes a number of categories that provide access to information and tools beyond just the basic settings. For example, you can click the Troubleshoot option to launch the Troubleshoot pane where you can get information about selected issues, as shown in the following figure.

To get to any information, you must select one of the predefined issues from the drop-down list. Currently, the list includes only the following two issues:

How do I migrate?
My queries are slow.

When you select either one of these issues, you’re provided with a link to documentation that addresses the topic. My assumption is that more issues will eventually be added to the list. (Note that the server settings also include a Troubleshoot option and its related pane, which includes more issues than what are available at the database level. However, all issues listed at the server level are specific to the SQL Database service, although they might prove useful when working with SQL Data Warehouse.)

The next option available to the database settings is Audit logs, which provides access to the Events pane, where you can view information about database events, as shown in the following figure.

Through the Events pane, you can carry out the following tasks:

Filter the information based on such variables as the event category or time span
Select which columns to include in the table
Hide or display the chart of summarized audit information
Export the audit logs to Azure storage or Event Hub

Next on the list of database settings is the Check health option, which launches a pane that is supposed to provide information about your database’s operations, as shown in the following figure.

Unfortunately, I was never able to get this pane to return any information other than what is shown in the figure. I tried logging into and out of the database several times within the allotted 10 minutes, on several different occasions, but still no luck. I’m assuming that this is related to the service still being in preview, but some other nefarious factor might be at play.

The database settings also include the New support request option, which launches the New support request pane and the Basicspane, the first step in acquiring support, as shown in the following figure.

If you have purchased a support plan, then you will be walked through the steps to request support, providing Azure can find the plan. If no support plan is found, your only option is to advance to Step 2, which provides the following links:

Buy a support plan (links to the Azure Customer Support page)
Ask your question on a forum (links to the Azure forums)
Connect with @AzureSupport (links to the Azure Support Twitter page)

You’re also provided the ability to specify the information necessary to link to an existing Premier Support contract or Azure Program Benefits information.

Within the database Settings pane, you’ll also find the Properties option, which finally gets us to the actual database settings. If you click the option, the Properties pane appears, as shown in the following figure.

The page provides read-only access to the following properties:

Pricing tier
Latest SQL database update
Status
Max size
Collation
Creation date
Connection strings
Server name
Location
Server admin login
Active Directory admin
Resource group
Subscription ID
Subscription name

For some of the settings, you can copy the property value to the clipboard by clicking the button to the right of that value.

Also in the Settings pane, you’ll find the Scale option, which let’s you set the DWUs assigned to your database, as shown in the following figure:

You can adjust your database’s DWUs at any time, boosting performance to meet your needs. Currently, you can scale up to 2000 DWUs, which currently comes to $14.00/hour. You can also access this pane directly from the database pane by clicking the Scalebutton at the top of the pane.

Another group of settings available to the Settings pane is Auditing & Threat detection, which launches the Auditing & Threatdetection pane. By default, the database settings are inherited from the server. In fact, you can access the server auditing settings by clinking the View server auditing settings link, as shown in the following figure.

If you want to configure auditing at the database level, clear the Inherit settings from server check box, and configure the settings accordingly.

Whether you configure auditing at the server level or database level, you must specify how the audit data will be stored and what data to audit. You can also configure threat detection at this time, but that feature, like SQL Data Warehouse itself, is still in preview.

In addition to auditing options, the database settings include the Transparent data encryption option, which launches the Transparentdata encryption pane, where you can enable data encryption, as shown in the following figure.