Skip to content

Latest commit

 

History

History
710 lines (609 loc) · 63.6 KB

by_zenkay.md

File metadata and controls

710 lines (609 loc) · 63.6 KB

zenkay/bigdata-ecosystem

Big Data Ecosystem Dataset

Incomplete-but-useful list of big-data related projects packed into a JSON dataset.

Related projects:

分类

Frameworks

  • Apache Hadoop - framework for distributed processing. Integrates MapReduce (parallel processing), YARN (job scheduling) and HDFS (distributed file system).

Distributed Programming

  • AddThis Hydra - distributed data processing and storage system originally developed at AddThis.
  • Akela - Mozilla's utility library for Hadoop, HBase, Pig, etc..
  • AMPLab SIMR - run Spark on Hadoop MapReduce v1.
  • AMPLab Succinct - Enabling Queries on Compressed Data.
  • Apache Crunch - a simple Java API for tasks like joining and data aggregation that are tedious to implement on plain MapReduce.
  • Apache DataFu - collection of user-defined functions for Hadoop and Pig developed by LinkedIn.
  • Apache Flink - high-performance runtime, and automatic program optimization.
  • Apache Gora - framework for in-memory data model and persistence.
  • Apache Hama - BSP (Bulk Synchronous Parallel) computing framework.
  • Apache MapReduce - programming model for processing large data sets with a parallel, distributed algorithm on a cluster.
  • Apache Pig - high level language to express data analysis programs for Hadoop.
  • Apache S4 - framework for stream processing, implementation of S4.
  • Apache Spark - framework for in-memory cluster computing.
  • Apache Spark Streaming - framework for stream processing, part of Spark.
  • Apache Storm - framework for stream processing by Twitter also on YARN.
  • Apache Tez - application framework for executing a complex DAG (directed acyclic graph) of tasks, built on YARN.
  • Apache Twill - abstraction over YARN that reduces the complexity of developing distributed applications.
  • Cascalog - data processing and querying library.
  • Cheetah - High Performance, Custom Data Warehouse on Top of MapReduce.
  • Concurrent Cascading - framework for data management/analytics on Hadoop.
  • Damballa Parkour - MapReduce library for Clojure.
  • Datasalt Pangool - alternative MapReduce paradigm.
  • DataTorrent StrAM - real-time engine is designed to enable distributed, asynchronous, real time in-memory big-data computations in as unblocked a way as possible, with minimal overhead and impact on performance.
  • DistributedR - scalable high-performance platform for the R language.
  • eBay Oink - REST based interface for PIG execution.
  • Facebook Corona - Hadoop enhancement which removes single point of failure.
  • Facebook Peregrine - Map Reduce framework.
  • Facebook Scuba - distributed in-memory datastore.
  • Geotrellis - geographic data processing engine for high performance applications.
  • GIS Tools for Hadoop - Big Data Spatial Analytics for the Hadoop Framework.
  • Google Dataflow - create data pipelines to help themæingest, transform and analyze data.
  • Google MapReduce - map reduce framework.
  • Google MillWheel - fault tolerant stream processing framework.
  • HParser - data parsing transformation environment optimized for Hadoop.
  • IBM Streams - advanced analytic platform that allows user-developed applications to quickly ingest, analyze and correlate information as it arrives from thousands of real-time sources.
  • JAQL - declarative programming language for working with structured, semi-structured and unstructured data.
  • Kite - is a set of libraries, tools, examples, and documentation focused on making it easier to build systems on top of the Hadoop ecosystem.
  • Kyro - Java serialization and cloning: fast, efficient, automatic.
  • Lipstick - Pig workflow visualization tool.
  • Metamarkers Druid - framework for real-time analysis of large datasets.
  • Netflix Aegisthus - Bulk Data Pipeline out of Cassandra. implements a reader for the SSTable format and provides a map/reduce program to create a compacted snapshot of the data contained in a column family.
  • Netflix Lipstick - Pig Visualization framework.
  • Netflix Mantis - Event Stream Processing System.
  • Netflix PigPen - map-reduce for Clojure whiche compiles to Apache Pig.
  • Netflix STAASH - language-agnostic as well as storage-agnostic web interface for storing data into persistent storage systems.
  • Netflix Zeno - Netflix's In-Memory Data Propagation Framework.
  • Nokia Disco - MapReduce framework developed by Nokia.
  • PigPen - PigPen is map-reduce for Clojure, or distributed Clojure. It compiles to Apache Pig, but you don't need to know much about Pig to use it.
  • Pinterest Pinlater - asynchronous job execution system.
  • Pydoop - Python MapReduce and HDFS API for Hadoop.
  • ScaleOut hServer - fast, scalable in-memory data grid for Hadoop.
  • SeqPig - Simple and scalable scripting for large sequencing data set(ex: bioinfomation) in Hadoop .
  • SigmoidAnalytics Spork - Pig on Apache Spark.
  • SpatialHadoop - SpatialHadoop is a MapReduce extension to Apache Hadoop designed specially to work with spatial data. .
  • Spring for Apache Hadoop - unified configuration model and easy to use APIs for using HDFS, MapReduce, Pig, and Hive.
  • SQLStream Blaze - stream processing platform.
  • Stratio Streaming - the union of a real-time messaging bus with a complex event processing engine using Spark Streaming.
  • Stratosphere - general purpose cluster computing framework.
  • Streamdrill - usefull for counting activities of event streams over different time windows and finding the most active one.
  • Teradata QueryGrid - data-access layer that can orchestrate multiple modes of analysis across multiple databases plus Hadoop.
  • TIBCO ActiveSpaces - in-memory data grid.
  • Torch - Scientific computing for LuaJIT.
  • Twitter Scalding - Scala library for Map Reduce jobs, built on Cascading.
  • Twitter Summingbird - Streaming MapReduce with Scalding and Storm, by Twitter.
  • Twitter TSAR - TimeSeries AggregatoR by Twitter.

Distributed Filesystem

Key-Map Data Model

  • Actian Vector - column-oriented analytic database.
  • Apache Accumulo - distribuited key/value store, built on Hadoop.
  • Apache Cassandra - column-oriented distribuited datastore, inspired by BigTable.
  • Apache HBase - column-oriented distribuited datastore, inspired by BigTable.
  • Facebook HydraBase - evolution of HBase made by Facebook.
  • Google BigTable - column-oriented distributed datastore.
  • Google Cloud Datastore - is a fully managed, schemaless database for storing non-relational data over BigTable.
  • Hypertable - column-oriented distribuited datastore, inspired by BigTable.
  • InfiniDB - is accessed through a MySQL interface and use massive parallel processing to parallelize queries.
  • Netflix Priam - Co-Process for backup/recovery, Token Management, and Centralized Configuration management for Cassandra.
  • OhmData C5 - improved version of HBase.
  • Sqrrl - NoSQL databases on top of Apache Accumulo.
  • Tephra - Transactions for HBase.
  • Twitter Manhattan - real-time, multi-tenant distributed database for Twitter scale.

Document Data Model

  • Actian Versant - commercial object-oriented database management systems .
  • Crate Data - is an open source massively scalable data store. It requires zero administration.
  • Facebook Apollo - Facebook’s Paxos-like NoSQL database.
  • jumboDB - document oriented datastore over Hadoop.
  • LinkedIn Espresso - horizontally scalable document-oriented NoSQL data store.
  • MarkLogic - Schema-agnostic Enterprise NoSQL database technology.
  • Microsoft DocumentDB - fully-managed, highly-scalable, NoSQL document database service.
  • MongoDB - Document-oriented database system.
  • RavenDB - A transactional, open-source Document Database.
  • RethinkDB - document database that supports queries like table joins and group by.
  • TokuMX - High-Performance MongoDB Distribution.

Key-value Data Model

  • Aerospike - NoSQL flash-optimized, in-memory. Open source and "Server code in 'C' (not Java or Erlang) precisely tuned to avoid context switching and memory copies..
  • Amazon DynamoDB - distributed key/value store, implementation of Dynamo paper.
  • Edis - is a protocol-compatible Server replacement for Redis.
  • ElephantDB - Distributed database specialized in exporting data from Hadoop.
  • EventStore - distributed time series database.
  • HyperDex - next generation key-value store.
  • LinkedIn Krati - is a simple persistent data store with very low latency and high throughput.
  • Linkedin Voldemort - distributed key/value storage system.
  • Oracle NoSQL Database - distributed key-value database by Oracle Corporation.
  • Redis - in memory key value datastore.
  • Redis Sentinel - system designed to help managing Redis instances.
  • Riak - a decentralized datastore.
  • Storehaus - library to work with asynchronous key value stores, by Twitter.
  • Tarantool - an efficient NoSQL database and a Lua application server.
  • TreodeDB - key-value store that's replicated and sharded and provides atomic multirow writes.

Graph Data Model

  • Apache Giraph - implementation of Pregel, based on Hadoop.
  • Apache Spark Bagel - implementation of Pregel, part of Spark.
  • ArangoDB - multi model distribuited database.
  • Facebook TAO - TAO is the distributed data store that is widely used at facebook to store and serve the social graph.
  • Faunus - Hadoop-based graph analytics engine for analyzing graphs represented across a multi-machine compute cluster.
  • Google Cayley - open-source graph database.
  • Google Pregel - graph processing framework.
  • GraphLab PowerGraph - a core C++ GraphLab API and a collection of high-performance machine learning and data mining toolkits built on top of the GraphLab API.
  • GraphX - resilient Distributed Graph System on Spark.
  • Gremlin - graph traversal Language.
  • InfiniteGraph - distributed graph database.
  • Infovore - RDF-centric Map/Reduce framework.
  • Intel GraphBuilder - tools to construct large-scale graphs on top of Hadoop.
  • MapGraph - Massively Parallel Graph processing on GPUs.
  • Neo4j - graph database writting entirely in Java.
  • OrientDB - document and graph database.
  • Phoebus - framework for large scale graph processing.
  • Sparksee - scalable high-performance graph database.
  • Titan - distributed graph database, built over Cassandra.
  • Twitter FlockDB - distribuited graph database.

NewSQL Databases

  • Actian Ingres - commercially supported, open-source SQL relational database management system.
  • BayesDB - statistic oriented SQL database.
  • Cockroach - Scalable, Geo-Replicated, Transactional Datastore.
  • Datomic - distributed database designed to enable scalable, flexible and intelligent applications.
  • FoundationDB - distributed database, inspired by F1.
  • Google F1 - distributed SQL database built on Spanner.
  • Google Spanner - globally distributed semi-relational database.
  • H-Store - is an experimental main-memory, parallel database management system that is optimized for on-line transaction processing (OLTP) applications.
  • HandlerSocket - NoSQL plugin for MySQL/MariaDB.
  • IBM DB2 - object-relational database management system.
  • InfiniSQL - infinity scalable RDBMS.
  • MemSQL - in memory SQL database witho optimized columnar storage on flash.
  • NuoDB - SQL/ACID compliant distributed database.
  • Oracle Database - object-relational database management system.
  • Oracle TimesTen in-Memory Database - in-memory, relational database management system with persistence and recoverability.
  • Pivotal GemFire XD - Low-latency, in-memory, distributed SQL data store. Provides SQL interface to in-memory table data, persistable in HDFS.
  • SAP HANA - is an in-memory, column-oriented, relational database management system.
  • SenseiDB - distributed, realtime, semi-structured database.
  • Sky - database used for flexible, high performance analysis of behavioral data.
  • SymmetricDS - open source software for both file and database synchronization.
  • Teradata Database - complete relational database management system.
  • VoltDB - in-memory NewSQL database.

Columnar Databases

  • Amazon RedShift - data warehouse service, based on PostgreSQL.
  • C-Store - column oriented DBMS.
  • Google BigQuery - framework for interactive analysis, implementation of Dremel.
  • Google Dremel - framework for interactive analysis, implementation of Dremel.
  • MonetDB - column store database.
  • Parquet - columnar storage format for Hadoop.
  • Pivotal Greenplum - purpose-built, dedicated analytic data warehouse.
  • Vertica - is designed to manage large, fast-growing volumes of data and provide very fast query performance when used for data warehouses.

Time-Series Databases

  • Cube - uses MongoDB to store time series data.
  • InfluxDB - distributed time series database.
  • Kairosdb - similar to OpenTSDB but allows for Cassandra.
  • OpenTSDB - distributed time series database on top of HBase.

SQL-like processing

  • Actian SQL for Hadoop - high performance interactive SQL access to all Hadoop data.
  • AMPLAB Shark - data warehouse system for Spark.
  • Apache Drill - framework for interactive analysis, inspired by Dremel.
  • Apache HCatalog - table and storage management layer for Hadoop.
  • Apache Hive - SQL-like data warehouse system for Hadoop.
  • Apache Optiq - framework that allows efficient translation of queries involving heterogeneous and federated data.
  • Apache Phoenix - SQL skin over HBase.
  • BlinkDB - massively parallel, approximate query engine.
  • Cloudera Impala - framework for interactive analysis, Inspired by Dremel.
  • Concurrent Lingual - SQL-like query language for Cascading.
  • Datasalt Splout SQL - full SQL query engine for big datasets.
  • Facebook PrestoDB - distributed SQL query engine.
  • JethroData - index-based SQL engine for Hadoop.
  • Metanautix Quest - data compute engine.
  • Pivotal HAWQ - SQL-like data warehouse system for Hadoop.
  • RainstorDB - database for storing petabyte-scale volumes of structured and semi-structured data.
  • Spark Catalyst - is a Query Optimization Framework for Spark and Shark.
  • SparkSQL - Manipulating Structured Data Using Spark.
  • Splice Machine - a full-featured SQL-on-Hadoop RDBMS with ACID transactions.
  • Stinger - interactive query for Hive.
  • Tajo - distributed data warehouse system on Hadoop.
  • Trafodion - enterprise-class SQL-on-HBase solution targeting big data transactional or operational workloads.

Integrated Development Environments

Data Ingestion

  • Amazon Kinesis - real-time processing of streaming data at massive scale.
  • Apache Chukwa - data collection system.
  • Apache Flume - service to manage large amount of log data.
  • Apache Samza - stream processing framework, based on Kafla and YARN.
  • Apache Sqoop - tool to transfer data between Hadoop and a structured datastore.
  • Apache UIMA - Unstructured Information Management applications are software systems that analyze large volumes of unstructured information in order to discover knowledge that is relevant to an end user.
  • Cloudera Morphlines - framework that help ETL to Solr, HBase and HDFS.
  • Facebook Scribe - streamed log data aggregator.
  • Fluentd - tool to collect events and logs.
  • Google Photon - geographically distributed system for joining multiple continuously flowing streams of data in real-time with high scalability and low latency.
  • Heka - open source stream processing software system.
  • HIHO - framework for connecting disparate data sources with Hadoop.
  • LinkedIn Databus - stream of change capture events for a database.
  • LinkedIn Kamikaze - utility package for compressing sorted integer arrays.
  • LinkedIn White Elephant - log aggregator and dashboard.
  • Logstash - a tool for managing events and logs.
  • Netflix Suro - data pipeline service for collecting, aggregating, and dispatching large volume of application events including log data based on Chukwa.
  • Pinterest Secor - is a service implementing Kafka log persistance.
  • Record Breaker - Automatic structure for your text-formatted data.
  • TIBCO Enterprise Message Service - standards-based messaging middleware.
  • Twitter Zipkin - distributed tracing system that helps us gather timing data for all the disparate services at Twitter.
  • Vibe Data Stream - streaming data collection for real-time Big Data analytics.

Message-oriented middleware

  • ActiveMQ - open source messaging and Integration Patterns server.
  • Amazon Simple Queue Service - fast, reliable, scalable, fully managed queue service.
  • Apache Kafka - distributed publish-subscribe messaging system.
  • Apache Qpid - messaging tools that speak AMQP and support many languages and platforms.
  • Apollo - ActiveMQ's next generation of messaging.
  • Beanstalkd - simple, fast work queue.
  • Bit.ly NSQ - realtime distributed message processing at scale.
  • Celery - Distributed Task Queue.
  • Crossroads I/O - library for building scalable and high performance distributed applications.
  • Darner - simple, lightweight message queue.
  • Gearman - Job Server.
  • HornetQ - open source project to build a multi-protocol, embeddable, very high performance, clustered, asynchronous messaging system.
  • IronMQ - easy-to-use highly available message queuing service.
  • Kestrel - distributed message queue system.
  • Marconi - queuing and notification service made by and for OpenStack, but not only for it.
  • RabbitMQ - Robust messaging for applications.
  • RestMQ - message queue which uses HTTP as transport, JSON to format a minimalist protocol and is organized as REST resources.
  • RQ - simple Python library for queueing jobs and processing them in the background with workers.
  • Sidekiq - Simple, efficient background processing for Ruby.
  • ZeroMQ - The Intelligent Transport Layer.

Service Programming

  • Akka Toolkit - runtime for distributed, and fault tolerant event-driven applications on the JVM.
  • Apache Avro - data serialization system.
  • Apache Curator - Java libaries for Apache ZooKeeper.
  • Apache Karaf - OSGi runtime that runs on top of any OSGi framework.
  • Apache Thrift - framework to build binary protocols.
  • Apache Zookeeper - centralized service for process management.
  • Google Chubby - a lock service for loosely-coupled distributed systems.
  • Linkedin Norbert - cluster manager.
  • MPICH - high performance and widely portable implementation of the Message Passing Interface (MPI) standard.
  • OpenMPI - message passing framework.
  • Serf - decentralized solution for service discovery and orchestration.
  • Spotify Luigi - a Python package for building complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization, handling failures, command line integration, and much more.
  • Spring XD - distributed and extensible system for data ingestion, real time analytics, batch processing, and data export.
  • Twitter Elephant Bird - libraries for working with LZOP-compressed data.
  • Twitter Finagle - asynchronous network stack for the JVM.

Scheduling

Machine Learning

  • Apache Mahout - machine learning library for Hadoop.
  • Ayasdi Core - tool for topological data analysis.
  • brain - Neural networks in JavaScript.
  • Cloudera Oryx - real-time large-scale machine learning.
  • Concurrent Pattern - machine learning library for Cascading.
  • convnetjs - Deep Learning in Javascript. Train Convolutional Neural Networks (or ordinary ones) in your browser.
  • Decider - Flexible and Extensible Machine Learning in Ruby.
  • etcML - text classification with machine learning.
  • Etsy Conjecture - scalable Machine Learning in Scalding.
  • Google Sibyl - System for Large Scale Machine Learning at Google.
  • H2O - statistical, machine learning and math runtime for Hadoop.
  • IBM Watson - cognitive computing system.
  • MLbase - distributed machine learning libraries for the BDAS stack.
  • MLPNeuralNet - Fast multilayer perceptron neural network library for iOS and Mac OS X.
  • nupic - Numenta Platform for Intelligent Computing: a brain-inspired machine intelligence platform, and biologically accurate neural network based on cortical learning algorithms.
  • PredictionIO - machine learning server buit on Hadoop, Mahout and Cascading.
  • scikit-learn - scikit-learn: machine learning in Python.
  • Spark MLlib - a Spark implementation of some common machine learning (ML) functionality.
  • Sparkling Water - combine H2OÕs Machine Learning capabilities with the power of the Spark platform.
  • Vahara - Machine learning and natural language processing with Apache Pig.
  • Viv - global platform that enables developers to plug into and create an intelligent, conversational interface to anything.
  • Vowpal Wabbit - learning system sponsored by Microsoft and Yahoo!.
  • WEKA - suite of machine learning software.
  • Wit - Natural Language for the Internet of Things.
  • Wolfram Alpha - computational knowledge engine.

Benchmarking

Security

System Deployment

  • Ankush - A big data cluster management tool that creates and manages clusters of different technologies..
  • Apache Ambari - operational framework for Hadoop mangement.
  • Apache Bigtop - system deployment framework for the Hadoop ecosystem.
  • Apache Helix - cluster management framework.
  • Apache Mesos - cluster manager.
  • Apache Slider - is a YARN application to deploy existing distributed applications on YARN.
  • Apache Whirr - set of libraries for running cloud services.
  • Apache YARN - Cluster manager.
  • Brooklyn - library that simplifies application deployment and management.
  • Buildoop - Similar to Apache BigTop based on Groovy language.
  • Cloudera HUE - web application for interacting with Hadoop.
  • Deimos - Mesos containerizer hooks for Docker.
  • Develoop - tool for provisioning, managing and monitoring Apache Hadoop.
  • Facebook Autoscale - the load balancer will concentrate workload to a server until it has at least a medium-level workload.
  • Facebook Prism - multi datacenters replication system.
  • Ganglia Monitoring System - scalable distributed monitoring system for high-performance computing systems such as clusters and Grids.
  • Genie - Genie provides REST-ful APIs to run Hadoop, Hive and Pig jobs, and to manage multiple Hadoop resources and perform job submissions across them..
  • Google Borg - job scheduling and monitoring system.
  • Google Omega - job scheduling and monitoring system.
  • Hannibal - Hannibal is tool to help monitor and maintain HBase-Clusters that are configured for manual splitting..
  • Hortonworks HOYA - application that can deploy HBase cluster on YARN.
  • Jumbune - Jumbune is an open-source product built for analyzing Hadoop cluster and MapReduce jobs..
  • Marathon - Mesos framework for long-running services.

Applications

  • Adobe Spindle - Next-generation web analytics processing with Scala, Spark, and Parquet.
  • Apache Kiji - framework to collect and analyze data in real-time, based on HBase.
  • Apache Nutch - open source web crawler.
  • Apache OODT - capturing, processing and sharing of data for NASA's scientific archives.
  • Apache Tika - content analysis toolkit.
  • Domino - Run, scale, share, and deploy models Ñ without any infrastructure..
  • Eclipse BIRT - Eclipse-based reporting system.
  • Eventhub - open source event analytics platform.
  • HIPI Library - API for performing image processing tasks on Hadoop's MapReduce.
  • Hunk - Splunk analytics for Hadoop.
  • MADlib - data-processing library of an RDBMS to analyze data.
  • PivotalR - R on Pivotal HD / HAWQ and PostgreSQL.
  • Qubole - auto-scaling Hadoop cluster, built-in data connectors.
  • Sense - Cloud Platform for Data Science and Big Data Analytics.
  • Snowplow - enterprise-strength web and event analytics, powered by Hadoop, Kinesis, Redshift and Postgres.
  • SparkR - R frontend for Spark.
  • Splunk - analyzer for machine-generated date.
  • Talend - unified open source environment for YARN, Hadoop, HBASE, Hive, HCatalog & Pig.

Data Warehouse

Search engine and framework

  • Apache Lucene - Search engine library.
  • Apache Solr - Search platform for Apache Lucene.
  • ElasticSearch - Search and analytics engine based on Apache Lucene.
  • Elasticsearch Hadoop - Elasticsearch real-time search and analytics natively integrated with Hadoop. Supports Map/Reduce, Cascading, Apache Hive and Apache Pig..
  • Enigma.io - Freemium robust web application for exploring, filtering, analyzing, searching and exporting massive datasets scraped from across the Web.
  • Facebook Unicorn - social graph search platform.
  • Google Caffeine - continuous indexing system.
  • Google Percolator - continuous indexing system.
  • TeraGoogle - large search index.
  • Haeinsa - linearly scalable multi-row, multi-table transaction library for HBase based on Percolator.
  • HBase Coprocessor - implementation of Percolator, part of HBase.
  • hIndex - Secondary Index for HBase.
  • Lily HBase Indexer - quickly and easily search for any content stored in HBase.
  • LinkedIn Bobo - is a Faceted Search implementation written purely in Java, an extension to Apache Lucene.
  • LinkedIn Cleo - is a flexible software library for enabling rapid development of partial, out-of-order and real-time typeahead search.
  • LinkedIn Galene - search architecture at LinkedIn.
  • LinkedIn Zoie - is a realtime search/indexing system written in Java.
  • Sphnix Search Server - fulltext search engine.

MySQL forks and evolutions

  • Amazon RDS - MySQL databases in Amazon's cloud.
  • Drizzle - evolution of MySQL 6.0.
  • Google Cloud SQL - MySQL databases in Google's cloud.
  • MariaDB - enhanced, drop-in replacement for MySQL.
  • MySQL Cluster - MySQL implementation using NDB Cluster storage engine.
  • Percona Server - enhanced, drop-in replacement for MySQL.
  • ProxySQL - High Performance Proxy for MySQL.
  • TokuDB - TokuDB is a storage engine for MySQL and MariaDB.
  • WebScaleSQL - is a collaboration among engineers from several companies that face similar challenges in running MySQL at scale.

PostgreSQL forks and evolutions

  • HadoopDB - hybrid of MapReduce and DBMS.
  • IBM Netezza - high-performance data warehouse appliances.
  • Postgres-XL - Scalable Open Source PostgreSQL-based Database Cluster.
  • RecDB - Open Source Recommendation Engine Built Entirely Inside PostgreSQL.
  • Stado - open source MPP database system solely targeted at data warehousing and data mart applications.
  • Yahoo Everest - multi-peta-byte database / MPP derived by PostgreSQL.

Memcached forks and evolutions

Embedded Databases

  • Actian PSQL - ACID-compliant DBMS developed by Pervasive Software, optimized for embedding in applications.
  • BerkeleyDB - a software library that provides a high-performance embedded database for key/value data.
  • HamsterDB - transactional key-value database.
  • HanoiDB - Erlang LSM BTree Storage.
  • LevelDB - a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.
  • LMDB - ultra-fast, ultra-compact key-value embedded data store developed by Symas.
  • RocksDB - embeddable persistent key-value store for fast storage based on LevelDB.

Business Intelligence

  • ActivePivot - Java In-Memory OLAP cube stored in columns, with clearly decoupled pre/post processing.
  • Adatao - business intelligence and data science platform.
  • Apama analytics - platform for streaming analytics and intelligent automated action.
  • Atigeo xPatterns - data analytics platform.
  • BIME Analytics - business intelligence platform in the cloud.
  • Chartio - lean business intelligence platform to visualize and explore your data.
  • Datapine - self-service business intelligence tool in the cloud.
  • Jaspersoft - powerful business intelligence suite.
  • Jedox Palo - customisable Business Intelligence platform.
  • Microsoft - business intelligence software and platform.
  • Microstrategy - software platforms for business intelligence, mobile intelligence, and network applications.
  • Pentaho - business intelligence platform.
  • Qlik - business intelligence and analytics platform.
  • SpagoBI - open source business intelligence platform.
  • Spotfire - business intelligence platform.
  • Tableau - business intelligence platform.
  • Teradata Aster - Big Data Analytics.
  • Tessera - Environment for Deep Analysis of Large Complex Data.
  • Zeppelin - open source data analysis environment on top of Hadoop..
  • Zoomdata - Big Data Analytics.

Data Visualization

  • Arbor - graph visualization library using web workers and jQuery.
  • CartoDB - open-source or freemium hosting for geospatial databases with powerful front-end editing capabilities and a robust API.
  • Chart.js - open source HTML5 Charts visualizations.
  • Crossfilter - avaScript library for exploring large multivariate datasets in the browser. Works well with dc.js and d3.js.
  • Cubism - JavaScript library for time series visualization.
  • Cytoscape - JavaScript library for visualizing complex networks.
  • D3 - javaScript library for manipulating documents.
  • DC.js - Dimensional charting built to work natively with crossfilter rendered using d3.js. Excellent for connecting charts/additional metadata to hover events in D3.
  • Envisionjs - dynamic HTML5 visualization.
  • Freeboard - pen source real-time dashboard builder for IOT and other web mashups.
  • Gephi - An award-winning open-source platform for visualizing and manipulating large graphs and network connections.
  • Google Charts - simple charting API.
  • Grafana - graphite dashboard frontend, editor and graph composer.
  • Graphite - scalable Realtime Graphing.
  • Highcharts - simple and flexible charting API.
  • IPython - provides a rich architecture for interactive computing.
  • Keylines - toolkit for visualizing the networks in your data.
  • Matplotlib - plotting with Python.
  • NVD3 - chart components for d3.js.
  • Peity - Progressive SVG bar, line and pie charts.
  • Plot.ly - Easy-to-use web service that allows for rapid creation of complex charts, from heatmaps to histograms. Upload data to create and style charts with Plotly's online spreadsheet. Fork others' plots..
  • Recline - simple but powerful library for building data applications in pure Javascript and HTML.
  • Redash - open-source platform to query and visualize data.
  • Sigma.js - JavaScript library dedicated to graph drawing.
  • Vega - a visualization grammar.

Internet of things and sensor data

  • TempoIQ - Cloud-based sensor analytics.

Papers

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

1999