Tech 10: Fulfilling The Big Promises Of Big Data In The Cloud

Here are ten new software offerings for improving business user and application access to rapidly expanding cloud data sources.

Going Big In The Cloud

Spending on big data and business analytics products is expected to reach $210 billion by 2020, according to market researcher IDC, up from $150.8 billion in 2017. While that growth is fueling a continuous stream of startups in the big data arena, it's also pushing more established players to maintain a rapid pace of developing and delivering new and updated big data products.

Not surprisingly, many of those new and updated products are focused on the cloud. That includes software and services for running and managing business intelligence, database, data warehouse and data lake systems on cloud platforms. It also includes software for building pipelines to cloud-based data systems and tools for improving data accuracy and controlling access to cloud-based data.

Here’s a look at 10 recently debuted products that focus on working with big data in the cloud.

DataStax Constellation

DataStax Constellation is a new cloud platform with services for developing and deploying cloud applications architected for the Apache Cassandra database. The platform initially offers two services: DataStax Apache Cassandra-as-a-Service and DataStax Insights, the latter a performance management and monitoring tool for DataStax Constellation and for DataStax Enterprise, the vendor’s flagship Cassandra-based database product.

Domo For Amazon Web Services

Domo for AWS is a purpose-built software package that makes data from nearly two dozen AWS services securely accessible to anyone through the Domo business intelligence system. The software includes more than 20 discrete connectors to AWS services including S3, Redshift, RDS, Athena, Aurora, DynamoDB and CloudWatch, as well as AWS Billing and Cost Management. The software creates a prebuilt view of the data in those services and makes it available to users across an organization.

Dremio Data Lake Engine

Dremio’s new Data Lake Engine for AWS, Azure and hybrid cloud environments, part of the Dremio 4.0 release, vastly improves the performance of queries against data lakes—the huge stores of data many organizations have built on Hadoop, Amazon Web Services’ S3 and Microsoft’s Azure Data Lake Services. Technology advances such as columnar caching, predictive pipelining and a new execution engine kernel provide the performance boost. And the engines sport built-in connectors for analysis tools like Tableau, Microsoft Power BI and Looker.

MemSQL 7.0/MemSQL Helios

MemSQL develops the MemSQL distributed SQL database for operational analytics and cloud-native applications. The new MemSQL Helios, now in private preview, is an on-demand, fully managed cloud service that provides a scalable data platform for operational analytics, machine learning and artificial intelligence tasks. MemSQL 7.0, now in beta, offers “SingleStore” data management that supports rowstore or columnstore for workloads, and “system of record” features such as incremental backup and synchronous replication.

MongoDB Atlas Data Lake

MongoDB launched beta versions of its MongoDB Atlas Data Lake and MongoDB Atlas Full-Text Search software—new services that expand the ways developers can use the MongoDB database to work with data. Atlas Data Lake accesses and queries data in AWS S3 storage buckets in any format (JSON, Avro, Parquet and others) using the MongoDB Query Language. Atlas Full-Text Search provides rich text search capabilities (based on the Apache Lucerne technology) against MongoDB databases.

Naveego Complete Data Accuracy Platform

The Naveego Complete Data Accuracy Platform is a distributed data accuracy system that proactively manages, detects and eliminates customer data accuracy problems across multiple enterprise data sources. A new release of the platform provides self-service capabilities that help business users undertake master data management and “golden record” tasks. Naveego Accelerator is a data health analysis toolkit that calculates the percentage of records with consistency errors that can impact business operations and profitability.

Okera Policy Builder

Data lakes can quickly become legal swamps if care isn’t taken to ensure they meet data security and compliance requirements. To tackle that, Okera offers the Okera Active Data Access Platform, a data lake security and governance system. Okera Policy Builder, an addition to the Okera Data Access Platform, provides the tools that data stewards and data governance teams use to create and manage detailed data access control policies that dictate who can access what data.

Qlik Sense Enterprise On Qlik Cloud Services

Qlik is now allowing businesses and organizations to deploy its flagship Qlik Sense Enterprise entirely on Qlik Cloud Services, the vendor’s Software-as-a-Service environment. The move makes it easier for partners, businesses and organizations to deploy Qlik Sense Enterprise in a SaaS environment. The software also can be deployed on Kubernetes in a private cloud or on a public cloud platform.

Snowflake On Google Cloud

Snowflake Computing will provide its cloud-native data warehouse services on the Google Cloud Platform later this year, completing a trifecta for the services that already run on Amazon Web Services and Microsoft Azure. Offering Snowflake Data Warehouse on Google Cloud means Snowflake can better support customers’ multi-cloud strategies. The Snowflake Data Warehouse is also available on the Microsoft Azure Government platform for use by U.S. government agencies.

Talend Pipeline Designer

Data integration software developer Talend debuted Pipeline Designer, a web-based graphical design tool for data engineers and software developers for building data pipelines that integrate data across hybrid cloud and multi-cloud environments, on-premises databases and other data sources. Pipeline Designer is an addition to Talend Cloud, the company’s Platform-as-a-Service integration system.