Big Data

FOSS Project Spotlight: Sawmill, the Data Processing Project

Introducing Sawmill, an open-source Java library for enriching, transforming and filtering JSON documents. If you're into centralized logging, you are probably familiar with the ELK Stack: Elasticsearch, Logstash and Kibana. Just in case you're not, ELK (or Elastic Stack, as it's being renamed these days) is a package of three open-source components, each responsible for a different task or stage in a data pipeline.

InfluxData

What is ephemeral data, you ask? InfluxData can supply the answer, because handling it is the business of the company's InfluxData open-source platform that is custom-built for metrics and events.

Learning Data Science

In my last few articles, I've written about data science and machine learning. In case my enthusiasm wasn't obvious from my writing, let me say it plainly: it has been a long time since I last encountered a technology that was so poised to revolutionize the world in which we live.

Datamation's "Leading Big Data Companies" Report

The Big Data market is in a period of remarkable transition. If keeping tabs on this dynamic sector is in your wheelhouse, Datamation has made your homework easier by developing "Leading Big Data Companies", a report that provides "a snapshot of a market sector in transition".

Novelty and Outlier Detection

In my last few articles, I've looked at a number of ways machine learning can help make predictions. The basic idea is that you create a model using existing data and then ask that model to predict an outcome based on new data.

Classifying Text

In my last few articles, I've looked at several ways one can apply machine learning, both supervised and unsupervised. This time, I want to bring your attention to a surprisingly simple—but powerful and widespread—use of machine learning, namely document classification.

Unsupervised Learning

In my last few articles, I've looked into machine learning and how you can build a model that describes the world in some way. All of the examples I looked at were of "supervised learning", meaning that you loaded data that already had been categorized or classified in some way, and then created a model that "learned" the ways the inputs mapped to the outputs.

JMR SiloStor NVMe SSD Drives

Compute-intensive workflows are the environments in which the newly developed JMR SiloStor NVMe family of SSD drives is designed to show its colors.

Kodiak Data's MemCloud

Scientists working with big data regularly confront the high cost of acquiring the computational power needed to push the boundaries and innovate in data science.

Teaching Your Computer

As I have written in my last two articles (Machine Learning Everywhere and Preparing Data for Machine Learning), machine learning is influencing our lives in numerous ways.

iguazio's Continuous Analytics Solution

In industries like financial services, healthcare and IoT, organizations are faced with the challenge of complexity across the entire data lifecycle. To help enterprises solve big data operational challenges and generate real-time insights, iguazio has developed a new Continuous Analytics Solution.

CyKick Labs Ltd.'s Telepath

When a shopper enters a store, the retailer doesn't know if the person will simply browse, make purchases, shoplift or hold up the register. The same goes for visitors to a website. The challenge is to prevent and stop the bad guys without hindering beneficial customer transactions.

Preparing Data for Machine Learning

When I go to Amazon.com, the online store often recommends products I should buy. I know I'm not alone in thinking that these recommendations can be rather spooky—often they're for products I've already bought elsewhere or that I was thinking of buying. How does Amazon do it?

MultiTaction's MT Canvus-Connect

"A new era in visual collaboration" is the promise of MT Canvus-Connect, MultiTaction's new real-time collaboration software that enables visual touchscreen collaboration across remote locations in real time.

How to Fix the Edge

In December 2016, Peter Levine of the venture firm Andreessen Horowitz published a post with a video titled "Return to the Edge and the End of Cloud Computing" In it, he outlines a pendulum swing between centralized and distributed computing that goes like this:

SUSE Linux Enterprise High Availability Extension

Historically, data replication has been available only piecemeal through proprietary vendors. In a quest to remediate history, SUSE and partner LINBIT announced a solution that promises to change the economics of data replication.

Machine Learning Everywhere

The field of statistics typically has had a bad reputation. It's seen as difficult, boring and even a bit useless. Many of my friends had to take statistics courses in graduate school, so that they could analyze and report on their research. To many of them, the classes were a form of nerdy, boring torture.

iguazio's Enterprise Data Cloud

The description of iguazio's new flagship Enterprise Data Cloud platform is bold and simple: the world's fastest, simplest and lowest-cost enterprise data cloud. iguazio adds that unleashing the full potential of megatrend applications and analytics for big data, IoT and cloud-native applications, it has pioneered a new service-driven approach to enterprise d