The results of the 18th annual KDnuggets Software Poll were recently published. This poll asks “What Predictive Analytics, Data Mining, Data Science Software/Tools have you used in the past 12 months?”. This poll attracted 2,900 voters, and it is also worth mentioning that it sometimes attracts controversy due to excessive voting by some vendors. See all data at KDNuggets.
Some of the most relevant findings are:
- Python has now overtaken R as a Data Science Tool – barely but still noticeable (53% to 52% use but Python grew 15% while R only grew 6%)
- There are now 2 newcomers joining the top 10 list: Tensorflow and Anaconda
- Use of Excel for Analytics purposes decreased by 16%
- In terms of programming languages, Python, R and SQL run the show with usage of all 3 growing
- Big Data Tools was simplified to only 4 categories: Hadoop Open Source, Hadoop Commercial, SQL on Hadoop Tools and Spark. The highest growth tool is SQL on Hadoop and usage of Hadoop Open Source is decreasing
TOP SOFTWARE TOOLS
We have 2 newcomers this year: Anaconda and Tensorflow
Top 2 tools:
- Use: Python (53%) and R (52%)
- Growth: Tensorflow (197%), Anaconda (37%)
TOP LANGUAGES
Top 2 languages:
- Use: Python (53%) and R (52%)
- Growth: Python (15%), R (6%)
TOP BIG DATA TOOLS
The tools on the survey have been simplified to 4: Hadoop Open Source, Hadoop Commercial, Spark and SQL on Hadoop Tools.
Top 2 Big data tools:
- Use: Spark (23%) and Hadoop Open Source (15%)
- Growth: SQL on Hadoop Tools (41%), Spark (5%)
It is important to notice the decrease of 32% in usage of Hadoop Open Source. I am not sure if there has been a real decrease, or if this is the result of the survey having changed splitting the hadoop category in 2: Open Source and Commercial. Part of this “decrease” could be attributed to the fact that there are now 2 categories instead of 1.