AWS-based Big Data project

Challenge

After initial research, FortySeven engineers outlined the following key requirements for the required solution: Extensive reporting on application usage data collected from users' devices. The amount of reported data is expected to increase. Ability to calculate existing/new stat types based on all collected data. In other words, data storage must be provided for life. Ability to filter calculated statistics and create charts based on specific criteria. The ability to automatically perform statistical calculations based on specific periods.

FortySeven developers implemented a multi-stage data line:

First stage: a collection of primary data. To lower the cost of the reporting API and protect it from unauthorized access, the web development team used the Amazon Cognito service, which provides accurate permission management for both authorized users and guests. The reports are collected from the Android apps on the devices and sent directly to the S3 group with Cognito-controlled write-only access. This is an entirely secure transaction because it is impossible to steal information from the Amazon S3 bucket that is accessed only by writing. The data received as a result of the above operations contains a large number of small report files in a special repository S3
The second stage: data compression. Before storing data in the main bucket, we convert it to larger, compressed, vertically formatted files that are optimized for partial reading of data. We enable the conversion with the help of a Spark icon that runs on the Amazon Web Services EMR Spot Instance Hadoop block that automatically turns on and off. Here we see the benefit of the AWS dashboard it brings to the project because the EMR service provides Hadoop group management automatically without engineering efforts
The third stage: statistics. FortySeven engineers manually launched a similar EMR-based suite and ran the distributed Spark software to calculate the required statistics. The statistical output is saved in an extensive SQL database hosted by Amazon. The analytical result contains several SQL tables optimized to search in a date range
Fourth stage: visualization of statistics. The Apache Tomcat web interface uses the D3 Javascript library to draw graphs based on the data results stored in the SQL database defined according to specific criteria.
An app’s user can pick out the object by scanning a barcode or QR code

Industry

Consumer Goods and Services

Expertise

Mobile Application Development, Front-End Development

Technologies

Java, Apache Spark Over Hadoop, Apache Parquet, MySQL Server, Apache Tomcat, Linux, AWS

Approach

To perform all statistical manipulations, you need a lot of disk space and CPU power. This is why the Amazon Web Services platform was chosen because it provides a complete set of services and resources required for this solution at a reasonable cost. AWS technology provides a relatively inexpensive storage space for an almost unlimited amount of data. It supports the automated deployment of Hadoop groups, which was necessary to fulfill the customer's mission. FortySeven developers implemented a multi-stage data line.

Result

The Amazon Web Services platform enables the implementation of a complex significant data pipeline that avoids the main restricted access of any big data processing:

Disk space
CPU power
Distributed computing

Amazon S3 is one of the cheapest value-added flagship store solutions in its class, and the EMR cluster provides an easy and highly automated way to instantly deploy and build a large Hadoop cluster, significantly reducing the cost of development efforts.

More Case Studies

FortySeven - Finmatic for UK Startup: Saving Time in Monetary Operations around the EU

Finmatic for UK Startup Saving Time in Monetary Operations around the EU – FortySeven

Banking & Fintech services

FortySeven - ERP Solution For The Energy Company

ERP Solution For The Energy Company – FortySeven

Energy, Oil & Gas

FortySeven - Online Digital Banking Platform (BaaS)

Online Digital Banking Platform (BaaS) – FortySeven

Banking & Fintech services

View All Case Studies

FortySeven Software Professionals is a leading European based software development, IT staffing, IT outsourcing and IT consulting company (certified with ISO 27001, ISO 9001). We help companies from F500 and startups to fulfill missing expertise for short-term and long-term projects.

Our services

Our Services

About FortySeven

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.

Necessary

Always Enabled

Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Functional

Performance

Analytics

Others