How to make the best use of Big Data?

Blog. Immerse yourself in AI

How to make the best use of Big Data?

The best use of big data

Przemysław Rosowski
Big Data/AI engineer
February 18th, 2019

From this article you will learn:

  1. The current use of big data across various industries?
  2. What technologies are being heavily used in modern organizations?
  3. What is the scale of their application on the market?

Recent articles regarding data storage and analysis technologies have mainly covered the handling of information generated by Internet of Things sensors, or portable devices and the merging these data sources.

Practically every area of human activity is generating more and more data. For example, the millions of smartphones and the dozens of applications installed on each of them, steadily produce a vast amount of data, that is transformed into useful information by marketers and others.

As a response to the growing amount of data and increasing expectations to reduce the completion time of operations on data, solutions based on the Hadoop platform have been created.

Various databases supporting indexing and faster search of texts, searching for relations between entities – document databases that deal well with non-structural data, or graph databases that support graph analysis have also become more popular.

These technologies are already being used effectively in many industries including:

Marketing and sales

The classic needs of marketing and sales departments include understanding trends, creating recommendation systems and prediction models. In addition to catering to these classic needs, Big Data technologies allow for efficient combination of standard data sources with the analytics from non-traditional data sources, e.g. social media or geolocation.

In fact, in e-commerce, the ability to precisely adjust advertisements and make product recommendations to users pertinent to user interests and behavior is already the basis for the functioning of most online stores. Continuous development of these solutions is possible thanks to the use of architectures enabling efficient processing of data concerning, among others, user behavior logs.

Examples of data sources in sales and marketing

geolocation
social media
interests
websites

Industry

Industrial companies spend vast amounts of time on repairs and maintenance of production resources. Some of these costs can be reduced by using modern technologies to predict damage.
Nowadays, sensors can be integrated with machines to generate valuable operational data. Thanks to solutions that support processing these data streams, as well as supporting connections to real-time and time-series databases, it is now possible to monitor devices to detect operational patterns, catch anomalies, and perform predictive analysis to determine the time to a potential failure.

More information about predictive maintenance you can find here.

Security and banking

Fraud detection is currently one of the most pressing topics in the banking sector. Big Data technologies enrich traditional fraud prevention techniques through processes like real-time analysis of user behavior.
This enables immediate reaction to cybercrime attempts. An example of such behavior is the suspicious use of a payment card.
New database technologies such as graph databases, make it is easier to perform routine surveillance for suspicious transactions and users behaviour.

Healthcare

With the increasing prevalence of mobile devices, the ability to track the health of patients has improved. Mobile devices generate an incredible amount of data, thus applications have tremendous potential to make a meaningful impact in healthcare. For example, data from sensors contained in these devices (e.g., wearables) can be sent to the cloud, where they are then analyzed.
Another example is the EHR (Electronic Health Record) system. Each patient has his or her digital account, which contains information about their treatment history, personal data, etc. In this case, the system can be used as an electronic health record. It may even suggest to the doctor what steps in the treatment should be taken.

Popular technologies

There is a proliferation of solutions on the market. The most important of them are:

NoSQL databases

Their name derives from “non-SQL”. The popularity of such databases has increased with the growing demand for storage of unstructured data and inadequate performance of relational databases in some applications.

Such databases focus primarily on specific types of data and queries. Column, key-value, documentary or graphical databases are the leading ones.

Apache Hadoop and MapReduce

It is a set of open source software, which enables data processing in a dispersed way. Its most significant advantage is its scalability. Thanks to it, if necessary, you can increase computing performance and disk space, without having to change the data format.
MapReduce enables efficient processing of data stored in the Hadoop file system, through mapping and reduction operations.

Apache Spark and streaming technologies

Apache Spark allows for efficient data processing. Unlike MapReduce, calculations are performed on information stored in the operating memory, which significantly speeds up the process.
Spark also supports graph processing using the Spark GraphX library, SQL processing and streaming (Spark Streaming).
Real-time data analysis is a popular solution for topics related to predictive maintenance or social media analysis in real-time.
In addition to Spark Streaming, there are other tools available on the market that support this type of data. These include, for example, Apache Storm and Apache Flink.

Big Data Technologies

noSQL
hadoop
Apache_Spark_logo
stream-processing-technologies

A scale of applications

Big Data technologies enable many activities – from real-time data analysis to batch processing.
However, these applications are strictly dependent on the type of data and operations performed on them. Nowadays, it means mainly moving calculations to the cloud, which in turn allows for flexible resource management.

Edge computing

As the amount of data coming from the Internet of Things devices increases, the importance of so-called edge computing is also growing. This means carrying out analyses in a dispersed way. Data is processed “close to its source”.
This reduces the amount of information transmitted over the network and, as a consequence, also decreases the required storage space, because some operations can be performed at an earlier stage.

Big data - challenges for companies

It has long been known, that the technologies described above have been successfully used by the largest technological giants, such as Google, Facebook or Amazon.
Currently, the biggest challenge for many companies is how to process vast amounts of data as quickly as possible. Analyzing such large sets of data may lead to crucial breakthroughs.
However, one should be cautious here, because the choice of technology and approach to the implementation of big data should be preceded by a thorough analysis that carefully considers all the pros and cons of different solutions.

Building a good architecture plays a crucial role in effective data processing and the possibilities of making advanced solutions using artificial intelligence algorithms.

Would you like to automate processes and save time? Check out the possibilities of multi-label classification.

How you can use AI in your industry?