Azure ASR Error- 78052 Master target contains different types of scsi controllers.

Microsoft Azure

This is a bit of a self-explanatory one, but I thought I would mention it anyway. When you build an ASR Master Target server make sure if you have more than one SCSI controller that they are of the same type, it doesn’t matter what type they are (LSI Logic SAS, VMware Paravirtual, ect..) but they both need to be the same or you will get the following error on the Azure portal when you attempt to fall back the machine to On-premeses.

 

Azure ASR Error- 90068 disks specified not present

Microsoft Azure

Quick fyi for anyone using Azure ASR, make sure if you are protecting a virtual machine located in Azure to unselect the temp drive disk D when you are adding the machine to ASR protection. If you try and protect the disk to on-premises, you will get the below error message. If you do you will need to delete the protection and reprotect without drive D. The below error only occurs when you try and reprotect to on-premises, it seem to work fine if you reprotect to another azure location.

 

Traffic Manager Endpoint monitor and ADFS /adfs/probe

Microsoft Azure, Windows

Microsoft has a very nice post on how to setup Traffic manager in front of an ADFS farm for high availability, where both sites are in Azure but in different GEO locations or one in Azure and one on premises. The Article is located here: https://docs.microsoft.com/en-us/azure/active-directory/active-directory-adfs-in-azure-with-azure-traffic-manager. What the article lacks is how to setup proper ADFS monitoring, which monitors both tte WAP and the ADFS service, at the moment the article only goes into details which monitor the WAP service.

So this post will go over how to configure your environment so the health point will report the status of both WAP and ADFS.

Some info before we begin:

  • The solutions is achieved by monitoring the /adfs/probe/ on the ADFS server via the WAP proxy
  • The solution will report failure if the WAP proxy is not forwarding or the ADFS service is down. So we are monitoring the whole solution.
  •  It will work if you have an external load balancer in front of the WAP servers and an internal one in front of the ADFS servers, for simplicity I will outline how it’s done on the non-load-balanced solution but it’s the same procedure for both.
  • You can’t monitor /adfs/probe on the WAP server as that will only give you the status of the WAP server
  • You can create a rule on the WAP server to redirect /adfs/probe to the ADFS server, but it will get ignored and show you the status of the WAP server.
  • I tested this on Server 2016 but it will work for 2012 R2 as well
  • If you are using 2012 R2 make sure you update your WAP to the latest version so you can forward HTTP traffic
  • We use HTTP as this prevents certificate problems and because Traffic manager does not support SNI.
  • You can’t monitor the “/federationmetadata/2007-06/federationmetadata.xml” because the way you set this up for Traffic manager means you are monitoring the ADFS on a different DNS so the request will not be forwarded.

Essentially this is what we are doing

adfs_probe_check

Once you setup the environment as per Microsofts Article above we need to do the following:

The variables for my test environment:

  • ADFS URL and Federation Service Name – test123.blah.local
  • Traffic Manager DNS – adfstest.trafficmanager.net
  • WAP server public IP dns (this can be replaced by a load balancer) – http://mytestadfsa.westeurope.cloudapp.azure.com
  • Custom monitor path (you can choose anything but the default which is /adfs/) –  /adfsprobe/

The Steps:

  • Change the Traffic Manager Configuration to point to our custom monitor path for the endpoint monitoring

configuration-microsoft-azure

  • Create an HTTP rule on the WAP server in the Remote Access Management Console to forward (via Pass- through) the WAP DNS + our custom monitor path to the ADFS server. I assume that your WAP server host file has been modified to point the ADFS URL to the ADFS internal IP or load balancer IP

wap-rule

iis-url-rewrite

  • The rule to be created is Reverse Proxy with the following settings:

arp-rule

  • And finally change your Public DNS record and create a CName for your ADFS URL (test123.blah.local) to point to the traffic manager DNS name (adfstest.trafficmanager.net)

And you are done.

Powershell Add-Computer error when executed remotely.

Windows

When you execute the PowerShell command: “Add-Computer -DomainName “contoso.com” -Credential $domainjoinuser -Restart” remotely or in a non-interactive environment you may get the following error:

The root of the problem is (given that your password is correct) when running things interactively the domain is pre-appended and as such you only need to provide the user. But in a non-interactive environment, the domain is not known as such it’s a very simple fix, make sure you either include the short domain names like “contoso\DMAdmin” or the full FQDN “DMAdmin@contoso.com. The error occurred for me by running an Azure custom script which called a PowerShell script non-interactively.

The ACL RemoveAccessRule Not Working

Windows

If you try and modify the ACL via PowerShell but the command RemoveAccessRule is not working, by that I mean you run it no errors come up but the rules and not being removed.  The problem is that inheritance is turned on and you are trying to remove a rule that is obtained from inheritance. To fix this problem you first need to disable inheritance, save the rule and then get the acls again. After that remove will work. Code can be found below:

 

New Virtual Disk Error “UseMaximumSize” (Storage Pool)

Windows

This is a heads up post for anyone who is creating a new virtual disk from a storage pool in Storage Spaces. If you are creating a disk and want to use the “Maximum size” parameter then you have to make sure that the Provisioning type is set to Fixed and not Thin. This is a normal expected behaviour as the Thin disk expands automatically and take space as it needs so you can’t pre-located 100% of the size, you will need to specify an initial disk size. For fixed size “-ProvisioningType Fixed” you can use the “-UseMaximumSize” parameter.

The unfriendly Powershell Error:

The GUI Error:

virtualdisk-error

Spark unable to see tables/dbs in the metastore when using the Cloudera (CDH) Quickstart Docker image

Hadoop

If you’ve used the Cloudera CDH Quickstart docker container you might have found that if you create some databases and tables using Hive you may not be able to see them when using spark. The  reason is because the spark configuration directory doesn’t have the hive-site.xml but there is a workaround that can be used without having to restart spark.

This is explained in the link https://www.tutorialspoint.com/spark_sql/spark_sql_hive_tables.htm
“Hive comes bundled with the Spark library as HiveContext, which inherits from SQLContext. Using HiveContext, you can create and find tables in the HiveMetaStore and write queries on it       using HiveQL. Users who do not have an existing Hive deployment can still create a HiveContext. When not configured by the hive-site.xml, the context automatically creates a metastore called  metastore_db and a folder called warehouse in the current directory.”
If you have started spark already you will need to delete the Sqlite based metastore that is created otherwise it will ignore our workaround. Which is what appears to be happening after running pyspark or spark-shell, perhaps because you have to use the –files parameter the first time around or it never uses the metastore configured in the hive-site.xml
The workaround involves passing the hive-site.xml via the –files parameter:

pyspark --master yarn --files /etc/hive/conf/hive-site.xml
You should then be able to see the databases/tables.

 >>> for db in sqlContext.sql("SHOW DATABASES").collect():

  ...     print db
 16/12/21 13:55:43 INFO scheduler.DAGScheduler: Job 0 finished: collect at <stdin>:1, took 1.424182 s

 Row(result=u'default')

 Row(result=u'trainingdata')

HDInsight Cluster Scaling

Microsoft Azure, Powershell

I recently had to come up with options for scaling a Azure HDInsight cluster out (adding worker nodes) or in (removing worker nodes) based on a time based schedule.    At the time of writing HDInsight doesn’t allow you to pause or stop a cluster – you have to delete the cluster if you do not want to incur costs. One way of potentially reducing costs is to use workload specific clusters with Azure Blob Storage and Azure SQL Database for the metastore, then scaling down the cluster outside of core hours when there is no processing to be done.

There are a number of ways of doing this including:

  • Using Azure Resource Manager (ARM) template that has a parameter for the number of worker nodes; then simply running the template at a scheduled time with the    appropriate number of worker nodes.
  • Using the PowerShell module to scale the cluster with Set-AzureHDInsightCluster cmdlet and either running the script as a scheduled task from a VM or using Azure Automation

This post is going to show how to use a script that can be run either via Azure Automation or via a scheduled task running on a VM, the solution consists of:

  • An Azure Storage Account containing XML configuration files that includes information on the cluster, the subscription within which it resides, the number of     worker nodes when scaling out the cluster, the number of worker nodes when scaling in the cluster, a list of email addresses of people that are to be notified when a   scaling operation takes place
  • A PowerShell script that uses the Azure PowerShell module to scale the cluster out and send email notifications
  • Optionally an Azure Automation account from which to schedule the script to run

Azure Storage

Create a storage account (a LRS storage account is sufficient) and a private container, in the container for each cluster store a XML configuration file of the following format:

 <ClusterConfiguration>

     <SubscriptionName>MySubscriptionName</SubscriptionName>

     <ResourceGroupName>MY-RG-0001</ResourceGroupName>

     <ClusterName>vjt-hdi1</ClusterName>

     <MinWorkers>1</MinWorkers>

     <MaxWorkers>3</MaxWorkers>

     <Notify>hadoopsupport@acme.com,team@acme.com</Notify>

 </ClusterConfiguration>

Where:

  • <SubscriptionName> is the subscription containing the cluster to be scaled out/in
  • <ResourceGroupName> is the resource group containing the HDInsight cluster to scale
  • <ClusterName> is the name of the cluster to scale
  • <MinWorkers> is the number of worker nodes in the cluster when performing a scale IN operation
  • <MaxWorkers> is the number of worker nodes in the cluster when performing a scale OUT operation
  • <Notify> is a comma separated list of email addresses to which notifications are to be sent

NOTE: It is important that the XML configuration file is in UTF-8 format.

While one XML configuration file could conceivably contain the information for every cluster there are a number of benefits to having one configuration file per cluster:

  • No need to write logic to parse the file and find the relevant part for the target cluster
  • I can keep each cluster configuration in version control and separately modify them
  • A mistake in the file is less likely to affect all clusters

PowerShell script

The HDInsight cluster scaling PowerShell script is available in my GitHub repository.

The script will need to be modified to suit your environment but the key elements of the script are described here. Since the script can be either run from Azure Automation or via a scheduled task from a Windows server, it supports sending emails from an internal mail server. The script assumes no authentication is required for the internal mail server (but it is trivial to add authentication support for the internal mail server).

Email Providers

The script supports sending emails via SendGrid, Office365 or via internal / on-premise mail servers. At present the latter is supported via hard-coded mail server details but these could be   provided via a XML configuration file or parameters to the script.

Storing credentials on disk

There are other ways of doing this but when the script runs via a scheduled task from a VM it expects that the credentials for the SendGrid / Office365 email account to be stored in encrypted from on disk in an XML file. This is achieved using the Windows Data Protection API.
     1. First open a PowerShell console *as the user under which the scheduled task will run*. E.g.
         runas /user:SVC-RTE-PSAutomation powershell.exe
     2. Then read the password in
         $Password = Read-Host -assecurestring "Please enter your password"
         $Password | ConvertFrom-SecureString
     3. Paste it into the password tag of the XML file
      <credentials>
         <credential>
             <username>emailaccount@acme.onmicrosoft.com</username>
             <password>ReplaceWithActualPassword</password>
         </credential>
      </credentials>

Key Elements of the script

Since the script can be executed via Azure Automation or via a scheduled task, if the $PSPrivateMetadata.JobId property exists then the script is running in Azure Automation. following snippet checks whether the job is running in Azure Automation:
If( -not($PSPrivateMetadata.JobId) )
Next the script contains two try catch blocks one that authenticates using the “Classic” Azure PowerShell module, and the other with the ARM based Azure PowerShell module.
The Get-ClusterScalingConfigurationFile function uses the classic Azure PowerShell module cmdlets to retrieve the storage account key and creates an Azure Storage Account context which is then used to download the blob/file from the storage accoun uses the classic Azure PowerShell module cmdlets to retrieve the storage account key and creates an Azure Storage Account context   which is then used to download the blob/file from the storage account. Note it is important the the cluster configuration file is in UTF-8 format else when we try to parse the contents of the file as XML it will fail (especially if it’s UTF-8 WITH BOM).
 [byte[]] $myByteArray = New-Object -TypeName byte[] -ArgumentList ($BlobReference.Length)
 $null = $BlobReference.ICloudBlob.DownloadToByteArray($myByteArray,0)
 [xml] $BlobContent = [xml] ([System.Text.Encoding]::UTF8.GetString($myByteArray))
The Get-HDInsightQuota function uses the Get-AzureRmHDInsightProperties cmdlet to obtain the number of cores used and available for HDInsight, the script then uses information when scaling out the cluster.
First we use the Get-AzureRmHDInsightCluster cmdlet to get the cluster’s Azure Resource ID, then pass this to the Get-AzureRmResource cmdlet to then obtain the VM sizes currently in use by the cluster – this information is not provided by the Get-AzureRmHDInsightCluster.
 $HDInsightClusterDetails = Get-AzureRmHDInsightCluster | Where-Object {
    $_.Name -eq $ClusterName
  }
  $ClusterSpec = Get-AzureRmResource -ResourceId $HDInsightClusterDetails.Id |
  Select-Object -ExpandProperty Properties |
  Select-Object -ExpandProperty computeProfile