Amazon EMR announces general availability of EMR Studio

Original Post>

At AWS re:Invent 2020, we announced the preview of Amazon EMR Studio , an integrated development environment (IDE) that makes it easy for data scientists and data engineers to develop, visualize, and debug applications written in R, Python, Scala, and PySpark.

 Today, we’re excited to announce the general availability of EMR Studio and new features we’ve added since the preview, including the ability to use the Amazon EMR console and AWS CloudFormation to create and configure a new EMR Studio for your team, support for Microsoft Active Directory (AD) as an identity provider, a new quick start notebook experience, the ability to launch the live Apache Spark UI directly from an EMR Studio notebook, and support for private Git repositories.

EMR Studio provides fully managed Jupyter notebooks, and tools like Spark UI and YARN Timeline Service to simplify debugging. EMR Studio uses AWS Single Sign-On and allows you to log in directly with your corporate credentials without signing in to the AWS Management Console. You can install custom kernels and libraries, collaborate with peers using code repositories such as GitHub and Bitbucket, and run parameterized notebooks as part of scheduled workflows using orchestration services like Apache Airflow and Amazon Managed Workflows for Apache Airflow (Amazon MWAA).

With EMR Studio, you can run notebook code on Amazon EMR on Amazon Elastic Compute Cloud (Amazon EC2) or Amazon EMR on Amazon Elastic Kubernetes Service (Amazon EKS), and take advantage of the performance-optimized EMR runtime for Apache Spark. You can set up EMR Studio to run applications on existing EMR clusters or create new clusters using Cloud Formation templates for Amazon EMR.

Several customers participated in the EMR Studio preview, including Mapbox, which provides a mapping and location cloud platform for developers.

“Mapbox provides precise location data and developer tools to change the way we navigate the world,” says said Saba El-Hilo, Head of Data Platform, Mapbox. “EMR Studio allows us to prototype Spark applications and data science models that power large-scale data processing and transformations. The integrated development environment makes it easy for data scientists and engineers to perform ad hoc analysis and debug data processing workloads.”

Advertisements

New EMR Studio features

We’ve added new features based on feedback from preview customers to simplify both configuration and application development with EMR Studio.

Now you can use the EMR console, AWS CloudFormation, or the AWS Command Line Interface (AWS CLI) to create a new EMR Studio for your team. You can use the guided steps on the Amazon EMR console to easily set up security features and access control, and assign users or groups to an EMR Studio. You can also view Studio configurations and delete Studios in the UI. You can automate Studio creation in AWS CloudFormation by specifying the configurations and dependencies in a CloudFormation template. In addition, we’ve added support for Microsoft AD as an identity source that you can use with EMR Studio via AWS SSO.

Bestseller No. 1
Pwshymi Printhead Printers Head Replacement for R1390 L1800 Printhead R390 R270 R1430 1400 for Home Office Printhead Replacement Part Officeproducts Componentes de electrodomésti
  • Function Test: Only printer printheads that have...
  • Stable Performance: With stable printing...
  • Durable ABS Material: Our printheads are made of...
  • Easy Installation: No complicated assembly...
  • Wide Compatibility: Our print head replacement is...
Bestseller No. 2
United States Travel Map Pin Board | USA Wall Map on Canvas (43 x 30) [office_product]
  • PIN YOUR ADVENTURES: Turn your travels into wall...
  • MADE FOR TRAVELERS: USA push pin travel map...
  • DISPLAY AS WALL ART: Becoming a focal point of any...
  • OUTSTANDING QUALITY: We guarantee the long-lasting...
  • INCLUDED: Every sustainable US map with pins comes...

We also made it more flexible for administrators to create cluster templates. Now you can specify parameters that users can set when they create clusters using your template. Like in the preview, you can also provide multiple cluster variations with a fixed set of parameters if you prefer.

We’ve added new sample notebooks that make it easier to start building data science applications in EMR Studio. You can use samples such as PySpark code querying a Hive metastore and Python code for visualization for a quick start, create copies of the notebooks in your EMR Studio workspace, run them as is, or edit them to meet your unique needs. For more information and a list of EMR Studio sample notebooks, see Configure a Workspace for EMR Studio.

We’ve extended the collaboration features of EMR Studio to include connecting from notebooks in EMR Studio to GitHub, Bitbucket, GitLab, and AWS CodeCommit repositories on private networks such as on-premises and customer VPCs. During the preview, you could only connect to repositories on public networks.

Finally, we’ve made application debugging easier by enabling you to launch the live Apache Spark UI directly from notebooks within EMR Studio. During the preview, you had to leave the notebook in EMR Studio, locate the application of interest on the cluster, and launch the Spark History Server. Now you can access logs and debug your application without leaving the notebook interface in EMR Studio.

Get started with EMR Studio

If you already use Amazon EMR, check out the tutorial Getting Started with the Amazon EMR Studio Interface.

New
ABYstyle - Call of Duty Toiletry Bag Search and Destroy, Black, 26 x 14 x 8.5 cm, Handle on pencil case for easy carrying, Black, 26 x 14 x 8.5 cm, Handle on pencil case for easy carrying
  • 100% official
  • Very practical with multiple pockets
  • Handle on pencil case for easy carrying
  • Material: Polyester
  • Dimensions: 26 x 14 x 8.5 cm
New
1890 Wing Angel Goddess Hobo Morgan Coin Pendant - US Challenge Coin Liberty Eagle Novel Coin Adult Toy Funny Sexy Coin Lucky Coin Pendant Storage Bag for Festival Party
  • FUNNY COIN&BAG: You will get a coin and jewelry...
  • NOVELTY DESIGN: Perfect copy the original coins,...
  • LUCKY POUCH: The feel of the flannelette bag is...
  • SIZE: Fine quality and beautiful packing. Coin...
  • PERFECT GIFT: 1*Coin with Exquisite Jewelry Bag....
New
Panther red Fleece Beanie
  • German (Publication Language)

If you’re running Apache Spark and other big data applications on premises or self-hosting them, learn about migrating to Amazon EMR in the Self-Service EMR Migration Guide and create a migration plan for your organization in a free workshop with Amazon EMR specialists.


About the Author

Shuang Li is a Senior Product Manager for Amazon EMR at AWS. She holds a doctoral degree in Computer Science and Engineering from Ohio State University.