IT data analytics Archives

Push Amazon EMR step logs from Amazon EC2 instances to Amazon CloudWatch logs

April 18, 2023 GeneAka

Monitoring the logs generated from the jobs deployed on EMR clusters is essential to help detect critical issues in real time and identify root causes quickly. In this post, we create an EMR cluster and centralize the EMR

How Morningstar used tag-based access controls in AWS Lake Formation to manage permissions for an Amazon Redshift data warehouse

April 11, 2023 GeneAka

To solve this problem, we needed to build our own solution to convert the tag-based policies in Lake Formation into grants and revokes in the resource-based entitlements in Amazon Redshift. The tag-based access

Showpad accelerates data maturity to unlock innovation using Amazon QuickSight

April 11, 2023 GeneAka

Showpad built new customer-facing embedded dashboards within Showpad eOSTM and migrated its legacy dashboards to Amazon QuickSight, a unified BI service providing modern interactive dashboards,

Data Vault on Snowflake: Feature Engineering and Business Vault

April 4, 2023 GeneAka

Batch/file-based data is modeled into the raw vault table structures as the hub, link, and satellite tables illustrated at the beginning of this post. The reusable and shareable (by more than one ML model) feature values

Perform accent-insensitive search using OpenSearch

April 4, 2023 GeneAka

In order to enable our diacritic-insensitive search, we configure custom analyzers that use the ASCII folding token filter. Accent-insensitive search, also called diacritics-agnostic search, is where search results are

Implement slowly changing dimensions in a data lake using AWS Glue and Delta

April 4, 2023 GeneAka

The team is tasked with implementing SCD Type 2 functionality for identifying new, updated, and deleted records from the source, and to preserve the historical changes in the data lake An AWS Glue job (Delta

Index file content and metadata by using Azure Cognitive Search

April 4, 2023 GeneAka

This scenario uses indexers in Azure Cognitive Search to automatically discover new content in supported data sources, like blob and table storage, and then add it to the search index. This article uses an example

Accelerating revenue growth with real-time analytics: Poshmark’s journey

March 28, 2023 GeneAka

We discuss how to create such a solution using Amazon Kinesis Data Streams, Amazon Managed Streaming for Kafka (Amazon MSK), Amazon Kinesis Data Analytics for Apache Flink ; the design decisions

How SafetyCulture scales unpredictable dbt Cloud workloads in a cost-effective manner with Amazon Redshift

March 21, 2023 GeneAka

In this post, we share the solution SafetyCulture used to scale unpredictable dbt Cloud workloads in a cost-effective manner with Amazon Redshift. A source of unpredictable workloads is dbt Cloud,

Extend geospatial queries in Amazon Athena with UDFs and AWS Lambda

March 21, 2023 GeneAka

A SageMaker notebook uses an AWS SDK for pandas package to run a SQL query in Athena, including the UDF. We then use an Athena user-defined function (UDF) to determine which hexagon each historical

Accelerate data insights with Elastic and Amazon Kinesis Data Firehose

March 21, 2023 GeneAka

With this new integration, you can set up this configuration directly from your VPC flow logs to Kinesis Data Firehose and into Elastic Cloud. In the past, users would have to use an AWS Lambda function to transform the incoming data from VPC flow logs into an Amazon Simple Storage Service (Amazon

« 1 … 6 7 8 9 10 … 40 »

Tag: IT data analytics

Push Amazon EMR step logs from Amazon EC2 instances to Amazon CloudWatch logs

How Morningstar used tag-based access controls in AWS Lake Formation to manage permissions for an Amazon Redshift data warehouse

Showpad accelerates data maturity to unlock innovation using Amazon QuickSight

Data Vault on Snowflake: Feature Engineering and Business Vault

Perform accent-insensitive search using OpenSearch

Implement slowly changing dimensions in a data lake using AWS Glue and Delta

Index file content and metadata by using Azure Cognitive Search

Accelerating revenue growth with real-time analytics: Poshmark’s journey

How SafetyCulture scales unpredictable dbt Cloud workloads in a cost-effective manner with Amazon Redshift

Extend geospatial queries in Amazon Athena with UDFs and AWS Lambda

Accelerate data insights with Elastic and Amazon Kinesis Data Firehose

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Build an end-to-end change data capture with Amazon MSK Connect and AWS Glue Schema Registry

Use Apache Iceberg in a data lake to support incremental data processing