Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process data within a tolerable elapsed time. There are a number of concepts associated with big data: originally there were 3 concepts volume, variety, velocity.
We develop data presentation layers on top of reporting and analysis frameworks, including analytical dashboards, special-purpose visuals and highly interactive personalized data analysis tools.
Connecting data sources and applications under a unified user interface, we enable end users to discover, share, collaborate and act upon insights in real time.
Apache Spark is an open source processing engine build around speed, user ease, and sophisticated analtics. Apache Spark is a fast, in-memory data processing engine with elegant and expressive development APIs to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.
Apache Hadoop is an open source software framework for storing data, running application on cluster of commodity hardware. The Hadoop framework break big data into blocks on cluster.
NoSQL are new wide variety of database technologies that store unstructured data. It provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases.