How ENGIE automates the deployment of Amazon Athena data sources on Microsoft Power BI

ENGIE—one of the largest utility providers in France and a global player in the zero-carbon energy transition—produces, transports, and deals in electricity, gas, and energy services. With 160,000 employees worldwide, ENGIE is a decentralized organization and operates 25 business units with a high level of delegation and empowerment. ENGIE’s decentralized global customer base had accumulated lots of data, and it required a smarter, unique approach and solution to align its initiatives and provide data that is ingestible, organizable, governable, sharable, and actionable across its global business units.

ENGIE built an enterprise data repository named the Common Data Hub to align its customers and business units around the same solution. ENGIE used AWS to create the Common Data Hub, a custom solution built using a globally distributed data lake and analytics solutions on AWS. The Common Data Hub empowers teams to innovate by simplifying data access and delivering a comprehensive set of analytics tools, such as Amazon QuickSight, Microsoft Power BI, Tableau, and more.

In 2018, the company’s business leadership decided to accelerate its digital transformation through data and innovation by becoming a data-driven company.

“Amazon Athena is a key service in the ENGIE data ecosystem. It makes it easy to analyze data in a serverless manner so there is no infrastructure to manage. We used Athena to quickly build operational dashboards and get insight and high business value from the data available in our data lake.”

– Gregory Wolowiec, chief technology officer at ENGIE

ENGIE uses Microsoft Power BI to create dashboards and leverages the power of Amazon Athena through the out-of-the-box connector for Microsoft Power BI in which, complete raw data sets are not downloaded to the user’s workstation. While users create or interact with a visualization, Microsoft Power BI works with Athena to dynamically query the underlying data source so that they are always viewing current data.

In a previous blog post, you learned how to manually configure all the required infrastructure to create Microsoft Power BI dashboards using Athena with Microsoft Power BI DirectQuery enabled. ENGIE automated the creation and configuration of the Athena connections on Microsoft Power BI Gateway and Microsoft Power BI Online to be able to scale and reduce the manual overhead. In this post, you learn how ENGIE is doing it today.

Solution overview

The following diagram illustrates the solution architecture to automate the creation and configuration of the Athena connections on Microsoft Power BI Gateway and Microsoft Power BI Online.

As described in the previous blog post, the AWS CloudFormation stack deploys two Amazon Elastic Compute Cloud (Amazon EC2) instances in a private subnet in an Amazon Virtual Private Cloud (Amazon VPC): one instance is used for Microsoft Power BI Desktop, and the other is used for the Microsoft Power BI on-premises data gateway. This stack uses t3.2xlarge instances because they have the minimal hardware requirements recommended. You can increase or decrease the EC2 instance type depending on the performance of the gateway.

Additionally, the CloudFormation template creates an AWS Glue table that gives you access to the dataset. It creates an AWS Lambda function as an AWS CloudFormation custom resource that updates all the partitions in the AWS Glue table.

In addition to the architecture presented in the previous blog post, this stack creates multiple AWS Systems Manager documents to configure the Athena data sources on Microsoft Power BI Gateway and on Microsoft Power BI Online. The Systems Manager documents are run by another Lambda function that is triggered when we create or delete an entry on Amazon DynamoDB.

From the security standpoint, all resources are deployed within an Amazon VPC (a logically isolated virtual network), and it uses Amazon VPC endpoints to communicate between resources within your Amazon VPC and AWS services without the need of crossing an internet gateway, NAT gateway, VPN connection, or AWS Direct Connect. Additionally, the Microsoft Power BI on-premises data gateway doesn’t require inbound connections. Additionally, authentication with Athena is done on Microsoft Power BI Desktop and on the Microsoft Power BI on-premises data gateway using AWS Identity and Access Management (IAM) profile.

The daily estimated cost of this architecture is $18 USD, mainly driven by the EC2 instances.

Walkthrough overview

For this post, we step through a use case using the data from the 2015 New York City Taxi Records dataset hosted on the Registry of Open Data on AWS. The data is already stored in an Amazon Simple Storage Service (Amazon S3) bucket in Apache Parquet format and is partitioned. For more information about optimizing your Athena queries, see Top 10 Performance Tuning Tips for Amazon Athena.

First, you deploy the CloudFormation stack with all the infrastructure required. Then, you use AWS Systems Manager Session Manager (see Starting a session (Systems Manager console)) and any remote desktop client to configure the Microsoft Power BI instance in order to create and publish your dashboard. Complete the following steps:

  1. Deploy the CloudFormation stack.
  2. On the EC2 instance that has the name tag PowerBiDesktop, install and configure the Simba Athena ODBC driver and Microsoft Power BI Desktop.
  3. Create your dashboard on Microsoft Power BI Desktop and publish it.
  4. On the EC2 instance that has the name tag PowerBiGateway, install the Simba Athena ODBC driver and Microsoft Power BI on-premises data gateway.
  5. Create Athena resources on the Microsoft Power BI Gateway instance and data source on Microsoft Power BI Online.
  6. View your report on Microsoft Power BI Online.
  7. Remove data source on Microsoft Power BI Online and Athena resources on the Microsoft Power BI Gateway instance.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Create resources and prepare your environment

Deploy the CloudFormation stack by choosing Launch stack:

Keep note of the gateway name, because you use it later when configuring the Microsoft Power BI on-premises data gateway.

After you deploy the CloudFormation stack, you prepare your environment following the steps in the relevant sections from Creating dashboards quickly on Microsoft Power BI using Amazon Athena:

  • Logging in to your Microsoft Power BI Desktop instance – Access to your instance without a bastion
  • Installing and configuring Microsoft Power BI Desktop – Install the required software on the desktop
  • Creating an Athena connection on Microsoft Power BI Desktop – Configure the New York City taxi data source connection
  • Creating your dashboard on Microsoft Power BI Desktop and publishing it – Send the structure of your report to Microsoft Power BI Online
  • Logging in to your Microsoft Power BI on-premises data gateway instance – Install the required software on your gateway

Install and configure the Athena-enabled Microsoft Power BI on-premises data gateway

To set up your on-premises data gateway, complete the following steps:

  1. Download and install the latest Athena ODBC driver for Windows 64-bit.
  2. Download the Microsoft Power BI on-premises data gateway standard mode and launch the installer.
  3. For your gateway, choose On-premises data gateway (recommended).
  4. Accept the default values and choose Install.
  5. When the installer asks you to sign in, enter the email address associated with the Microsoft Power BI Pro tenant that doesn’t require MFA. (This should be the same user name and password that you provided when you launched the CloudFormation stack.)
  6. Choose Sign in.
  7. If asked to register a new gateway or migrate, restore, or take over an existing gateway, choose Register a new gateway.
  8. Give your gateway a name (use the same gateway name passed as a parameter when deploying the CloudFormation stack) and provide a recovery key.
  9. Choose Configure.

You should see a green check mark indicating the gateway is online and ready to be used.

Create Athena resources automatically on the Microsoft Power BI Gateway instance and Microsoft Power BI Online

To automate this process, you insert an entry on DynamoDB. The entry’s attributes have the DSN properties and the users that are allowed to use the data source on Microsoft Power BI Online.

  1. Launch AWS CloudShell from the AWS Management Console using either of the following methods:
    • Choose the CloudShell icon on the console navigation bar.
    • Enter cloudshell in the Find Services box and then choose the CloudShell option.
  2. Enter the following script:
    echo '#!/bin/bash
    
    stack_name=$1
    dsn_name=$2
    users=$3
    
    role_arn=$(aws cloudformation describe-stacks --stack-name $stack_name --query "Stacks[0].Outputs[?OutputKey=='\''DataProjectRoleArn'\''].OutputValue" --output text)
    aws_region=$(aws cloudformation describe-stacks --stack-name $stack_name --query "Stacks[0].Outputs[?OutputKey=='\''AWSRegion'\''].OutputValue" --output text)
    athena_bucket=$(aws cloudformation describe-stacks --stack-name $stack_name --query "Stacks[0].Outputs[?OutputKey=='\''AthenaOutputS3Bucket'\''].OutputValue" --output text)
    
    aws dynamodb put-item --table-name PowerbiBlogTable --item "{\"Name\":{\"S\":\"${dsn_name}\"}, \"AWSProfile\":{\"S\":\"${role_arn}\"}, \"AWSRegion\":{\"S\":\"${aws_region}\"}, \"Workgroup\":{\"S\":\"athena-powerbi-aws-blog\"}, \"S3OutputLocation\":{\"S\":\"${athena_bucket}\"}, \"S3OutputEncOption\":{\"S\":\"SSE_S3\"}, \"AuthenticationType\":{\"S\":\"IAM Profile\"}, \"Users\":{\"S\":\"${users}\"}}"
    ' > create_record.sh
  3. Give run permission on the created script:
    chmod u+x create_record.sh
  4. Run the script, passing as parameters your CloudFormation stack name, the DSN name that you want to create, and the users that you want to attach to the dataset (you can pass multiple users separated by a comma without spaces). For example:
    ./create_record.sh PbiGwStack taxiconnection [email protected]

This script inserts an entry with all the required properties in your DynamoDB table. When a new entry is added to DynamoDB, an event is captured by Amazon DynamoDB Streams and a Lambda function is triggered to run a Systems Manager document. The last document runs two scripts on the instance: the first one creates a new Athena ODBC DSN, and the second script creates a new data source on Microsoft Power BI Online.

View your report on Microsoft Power BI

To view your report, complete the following steps:

  1. Choose the workspace where you saved your report.
  2. On the Datasets + dataflows tab, locate the dataset, which has the same name as your report (for example, taxireport) and choose the options icon (three dots).
  3. Choose Settings.
  4. Choose Discover Data Sources.
  5. Expand Gateway Connection.
  6. Choose your gateway.
  7. For Maps to, choose taxiconnection.
  8. Choose Apply.
  9. Return to the workspace where you saved your report.
  10. On the Content tab, choose your report (taxireport).

You can now see your report online using the most recent data.

Remove Athena resources automatically on Microsoft Power BI Online and on the Microsoft Power BI Gateway instance

To automate this process, you remove an entry from the DynamoDB table. The entry’s attributes have the DSN properties and the users that are allowed to use the data source on Microsoft Power BI Online.

  1. Launch CloudShell.
  2. Enter the following script:
    echo '#!/bin/bash
    
    dsn_name=$1
    
    aws dynamodb delete-item --table-name PowerbiBlogTable --key "{\"Name\":{\"S\":\"${dsn_name}\"}}"' > delete_record.sh
  3. Give run permission on the created script:
    chmod u+x delete_record.sh
  4. Run the script, passing as parameters the DSN name that you want to delete. For example:
    ./delete_record.sh taxiconnection

This command removes an item from your DynamoDB table. When an entry is deleted from DynamoDB, an event is captured by DynamoDB Streams and a Lambda function is triggered to run a Systems Manager document. The last document runs two scripts on the instance: the first one removes the data source on Microsoft Power BI Online, and the second script removes the Athena ODBC DSN.

Clean up

To avoid incurring future charges, delete the CloudFormation stack and the resources that you deployed as part of this post.

Conclusion

ENGIE discovered significant value by using AWS services on top of Microsoft Power BI, enabling its global business units to analyze data in more productive ways. This post presented how ENGIE automated the process of creating reports using Athena with Microsoft Power BI.

The first part of the post described the architecture components and how to successfully create a dashboard using the NYC taxi dataset. The stack deployed uses only one EC2 instance for the Microsoft Power BI on-premises data gateway, but in production, you should consider creating a high-availability cluster of gateway installations, ideally in different Availability Zones.

The second part of this post deployed a demo environment and walked you through the steps to automate Athena data sources to be used on Microsoft Power BI. On the GitHub repository, you can find more scripts to help you to manage the users from your data sources on Microsoft Power BI and more.

For native access to your data in AWS without any downloads or servers, be sure to also check out Amazon QuickSight.


About the authors

Amine Belhabib is Hand-On Cloud Core Service Manager at ENGIE/ ENGIE IT. Innovative Cloud Surfer, helping ENGIE entities to accelerate their digital transformation and cloud first adoption strategy by designing, building, managing group cloud products and patterns in a use cases driven approach.

Armando Segnini is a Senior Data Architect with AWS Professional Services. He spends his time building scalable big data and analytics solutions for AWS Enterprise and Strategic customers. Armando also loves to travel with his family all around the world and take pictures of the places he visits.

Xavier Naunay is a Data Architect with AWS Professional Services. He is part of the AWS ProServe team, helping enterprise customers solve complex problems using AWS services. In his free time, he is either traveling or learning about technology and other cultures.

Amine El Mallem is a Senior Data/ML Ops Engineer in AWS Professional Services. He works with customers to design, automate, and build solutions on AWS for their business needs.

Anouar Zaaber is a Senior Engagement Manager in AWS Professional Services. He leads internal AWS, external partners, and customer teams to deliver AWS Cloud services that enable customers to realize their business outcomes.

https://aws.amazon.com/blogs/big-data/how-engie-automates-the-deployment-of-amazon-athena-data-sources-on-microsoft-power-bi/

Leave a Reply