What Data Science Tools Have you used in the past 12 months? KDnuggets Poll Results are out!

The results of the 18th annual KDnuggets Software Poll were recently published. This poll asks “What Predictive Analytics, Data Mining, Data Science Software/Tools have you used in the past 12 months?”.  This poll attracted 2,900 voters, and it is also worth mentioning that it sometimes attracts controversy due to excessive voting by some vendors.  See all data at KDNuggets.

Some of the most relevant findings are:

  • Python has now overtaken R as a Data Science Tool – barely but still noticeable (53% to 52% use but Python grew 15% while R only grew 6%)
  • There are now 2 newcomers joining the top 10 list: Tensorflow and Anaconda
  • Use of Excel for Analytics purposes decreased by 16%
  • In terms of programming languages, Python, R and SQL run the show with usage of all 3 growing
  • Big Data Tools was simplified to only 4 categories: Hadoop Open Source, Hadoop Commercial, SQL on Hadoop Tools and Spark.  The highest growth tool is SQL on Hadoop and usage of Hadoop Open Source is decreasing

TOP SOFTWARE TOOLS

We have 2 newcomers this year: Anaconda and Tensorflow

Top 2 tools:

  • Use: Python (53%) and R (52%)
  • Growth: Tensorflow (197%), Anaconda (37%)

TOP LANGUAGES

Top 2 languages:

  • Use: Python (53%) and R (52%)
  • Growth: Python (15%), R (6%)

TOP BIG DATA TOOLS

screen-shot-2017-05-24-at-10-46-35-am.png

The tools on the survey have been simplified to 4: Hadoop Open Source, Hadoop Commercial, Spark and SQL on Hadoop Tools.

Top 2 Big data tools:

  • Use: Spark (23%) and Hadoop Open Source (15%)
  • Growth: SQL on Hadoop Tools (41%), Spark (5%)

It is important to notice the decrease of 32% in usage of Hadoop Open Source. I am not sure if there has been a real decrease, or if this is the result of the survey having changed splitting the hadoop category in 2: Open Source and Commercial.  Part of this “decrease” could be attributed to the fact that there are now 2 categories instead of 1.

DEEP LEARNING TOOLS

Top Deep Learning Tools:

  • Use: Tensorflow (20%) and Keras (9.5%)
  • Growth: Microsoft CNTK (278%), mxnet (200%)

What is the Business Value of Big Data? – The Three Things Series

*** The 3 Things Series aims to simplify – sometimes even oversimplify – technology concepts so that you learn 3 things about a topic ***. Opinions are my own.

Organizations embark in Big Data projects typically with 3 goals in mind: cost reductions, improved decision making and the ability to create new products and services.

big data business value

1- Cost Reduction 

As the quantities and complexity of data in organizations increase, so does the cost of storing and processing this data. Decisions about how much data to keep available for analysis, and how much “historic” data to move to tape or other less expensive resources, are then made. The problem with this strategy is that by limiting the data that can be analyzed, the insight that can be derived from this data is also limited.

In recent years, technology developments especially in Open Source, have made cost reduction a reality through the use of inexpensive technology such as Hadoop clusters (Hadoop is a unified storage and processing environment that allows for data and data processing to be distributed across multiple computers). Hadoop clusters give organizations the ability to keep more data available for analysis at a lower cost, and to easily add complex data types (images, sound, etc) to the pool of data to be analyzed

2- Improved Decision Making

Data analysis can be significantly improved by adding new data sources and new data types to traditional data. For example a data-driven retailer may see significant benefits in their inventory planning processes, if a new data source like weather data is added to the model to better predict sales and inventory requirements. An enriched model may be able to predict shortages of winter clothing by incorporating temperature into the existing models. Additional benefits can also be achieved, if more complex data is analyzed. For example, this same retailer may better target their ads in social media, if they evaluate not only their clients purchasing history, but also the actions they take in social media to interact with their brands and those of their competitors.

3-  Development of New Products and Services

The most strategic and innovative business benefits will probably be achieved by the ability to use new data or new sources of data to create new products and services. Let’s think for a minute about the data our cars generate (yes, we don’t necessarily see it, but more and more cars are equipped with sensors that collect a lot of data about our driving history). Using this data, insurance companies can offer policies that are dynamically priced based on an individual’s driving history (which is good news to you only if you are a safe driver of course).   Integrating weather data can also bring tremendous savings to an insurance company. Some insurance companies have been able to achieve significant savings per claim by letting their clients know that a storm is coming and recommending they don’t leave their cars exposed to the elements. (Again, assuming that as a client you listen to your insurance company recommendations).

In summary, when thinking of the business value of Big Data, think of  three areas of value:

  • Cost reductions
  • Improved decision making
  • Ability to create new products and services

 

What is Big Data? (The 3 Things Series)

*** The 3 Things Series aims to simplify – sometimes even oversimplify – technology concepts so that you learn 3 things about a topic ***. Opinions are my own.

The technology industry is full of “buzzwords”, with Big Data being one of the most used in recent years. Organizations have always dealt with data and have stored that data in databases, but we can see in the chart below how searches on Google have changed throughout the years comparing searches for “Databases” to searches for “Big Data”.

 

databases vs big data searches

 

Big Data in general refers to the ability to gather, store, manage, manipulate, and – the most important one – get insights out of vast amounts of data. And the typical question is “how big does data need to be so it is considered Big?” And the answer is…. it depends. When it comes to size, an organization’s Big Data may be another organization’s small data.

There are 3 things to remember that define “Big Data”:

  • Volume. It refers to size. So if you are capturing vast amounts of information, you probably have Big Data in your hands
  • Velocity. Are you working with data at rest? Or data in motion? For example if you are analyzing sales figures for the past year, that data is at rest (it is not changing constantly). But if on the other hand you are analyzing tweets to understand how your clients are reacting to a product announcement, this is data in motion as it is continuously changing. It may not be necessarily big if you are looking at daily data, but the fact that it is data in motion is relevant to the definition of Big Data
  • Variety. As the ability to capture, store and analyze more data has increased, so has the interest in analyzing data that is more complex in nature. For example, an insurance company may want to analyze the recordings of customer service calls to determine what characteristics of the conversation led to a policy sale, a retailer may want to analyze videos to determine how people navigate the store and how that impacts sales, or a hospital may want to analyze x-rays to find patterns and correlations between common symptoms in patients.

So when it comes to the definition of Big Data, remember 3 things, or the 3 Vs:

  •  Volume (size)
  • Velocity (Frequency of data update during analysis)
  • Variety (complexity of data to analyze – images, videos, texts, log files, etc)