Building a Cloud-based OLAP Cube and ETL Architecture with AWS Managed Services

For decades, enterprises used online analytical processing (OLAP) workloads to answer complex questions about their business by filtering and aggregating their data. These complex queries were compute and memory-intensive. This required teams to build and maintain complex extract, transform, and load (ETL) pipelines to model and organize data, oftentimes with commercial-grade analytics tools.

Continue reading

Hydrate your data lake with SaaS application data using Amazon AppFlow

Organizations today want to make data-driven decisions. The data could lie in multiple source systems, such as line of business applications, log files, connected devices, social media, and many more. As organizations adopt software as a service (SaaS) applications, data becomes increasingly fragmented and trapped in different “data islands.” To make decision-making easier, organizations are building data lakes, which is a centralized repository that allows you to store all your structured and unstructured data at any scale.

Continue reading

black screen with code

Improve query performance using AWS Glue partition indexes

While creating data lakes on the cloud, the data catalog is crucial to centralize metadata and make the data visible, searchable, and queryable for users. With the recent exponential growth of data volume, it becomes much more important to optimize data layout and maintain the metadata on cloud storage to keep the value of data lakes

Continue reading

Build secure encrypted data lakes with AWS Lake Formation

Maintaining customer data privacy, protection against intellectual property loss, and compliance with data protection laws are essential objectives of today’s organizations. To protect data against security threats, vulnerabilities within the organization, malicious software, or cyber criminality, organizations are increasingly encrypting their data.

Continue reading

Amazon Redshift ML Is Now Generally Available – Use SQL to Create Machine Learning Models and Make Predictions from Your Data

With Amazon Redshift , you can use SQL to query and combine exabytes of structured and semi-structured data across your data warehouse, operational databases , and data lake . Now that AQUA (Advanced Query Accelerator) is generally available , you can improve the performance of your queries by up to 10 times with no additional costs and no code changes.

Continue reading

application blur business code

Introducing Amazon Kinesis Data Analytics Studio – Quickly Interact with Streaming Data Using SQL, Python, or Scala

The best way to get timely insights and react quickly to new information you receive from your business and your applications is to analyze streaming data . This is data that must usually be processed sequentially and incrementally on a record-by-record basis or over sliding time windows, and can be used for a variety of analytics including correlations, aggregations, filtering, and sampling.

Continue reading

Introducing Dataplex—an intelligent data fabric for analytics at scale

Enterprises are struggling to make high quality data easily discoverable and accessible for analytics, across multiple silos, to a growing number of people and tools within their organization. They are often forced to make tradeoffs—to move and duplicate data across silos to enable diverse analytics use cases or leave their data distributed but limit the agility of decisions.

Continue reading

application blur business code

Set up CI/CD pipelines for AWS Glue DataBrew using AWS Developer Tools

An integral part of DevOps is adopting the culture of continuous integration and continuous delivery (CI/CD). This enables teams to securely store and version code, maintain parity between development and production environments, and achieve end-to-end automation of the release cycle, including building, testing, and deploying to production.

Continue reading

1 32 33 34 35 36 41