Hadoop as a critical source of data processing in Complex data scenario


Hadoop is an open source framework that could be very resourceful in data processing of the complex data systems, and has been reverently used in the recent past for query processing in the complex databases that contains millions of records. The major advantage of Hadoop is that it clusters the entire records to few blocks and the query is run on each cluster and the compiled information is displayed in effective terms.

In this research paper the focus and attempt has been to understand how the Hadoop technology can be resourceful to banking organizations in data compilation and processing to extract data related to customers who could be potential customers to their housing loan products. The entire process of the implementation reflects that the technology could be very resourceful to banking organizations in terms of gaining insight to complex queries in real time environment, thru quick processing of data. This technical paper is a critical analysis of how Hadoop can be an effective data processing technology framework.

Table of Contents


Abstract 2

Table of Contents. 3

1.0        Introduction. 4

1.1        Project Overview.. 4

2.0        New Framework as a solution. 6

3.0        Outcome of the Solution. 8

4.0        Analysis and Learning Outcome. 8

5.0        Conclusion. 9

6.0         References. 10

1.0       Introduction

In the current trend business scenario, it is very essential that the organizations focus on information systems and work on the available sources of information to identify the potential opportunities and mold their products or services accordingly.  In the rapidly emerging business and technological trends, companies have to ensure that they are in touch with the market conditions, identify, evaluate and ensure that the opportunities are rightly utilized to make effective development. (C.Murthy & Mehta, 2011)

One of the key issues for the organizations are about engaging with the huge data which they have in their database and utilizing such data to gain insights in to the new product or service ventures. Data processing frame works like Hadoop shall be very resourceful to the organizations in terms of processing loads of data, to extract information related to specific queries.

One of the other major advantages of using Hadoop kind of system is, Hadoop being an open source framework, might not have much cost impact on the utilization of the frame work, where as if the companies effectively utilize the scenario it could have potential inputs which could help them in business decision making scenario. (Works)


 1.1      Project Overview

Real estate is one of the booming sectors as globally the infrastructure industry is going through a rapid development phase. One of the significant contributors of the real estate development is by the residential properties development. Many other industrial sectors are also internally dependent and are interconnected to the development of the residential properties sector. One of the major contributors to the growth of residential properties is by the retail banking sector. When the individuals are keen on buying properties, majority of the them avail loan and depending upon the individuals credit score, income factors and the personal profile, the retail banks take decision on lending loans to the customers.

It has become an intensive competitive scenario for the property developers and also the banks to find good customers whose credit records are good and the banks can expect good and hassle free repayments from the customers. It is very essential that the banking organizations aim at good accounts, and also due to the massive scale of operations which happens in their banking process, they hold millions of records pertaining to customer transactions. (Nemschoff, 2014)

If the banking organizations can effectively use the Hadoop kind of framework to compile and process the data which they hold, it could be easy for the banks to analyze the potential customers and promote the loan products and services they offer. Despite the fact that the bankers would be having some kind of information on the potential customers, when the companies run this kind of analytics on clusters of data from multiple dimensions it could be easy for the companies to analyze the data and effectively carry track the list of potential customers. (C.Murthy & Mehta, 2011)

This kind of data analysis could be very resourceful to the organizations in terms of quality outcome analysis that could be resourceful to the banking companies to identify their potential customers from the existing database.

2.0       New Framework as a solution

The System that has been developed is a potential solution for the banking companies to run the scores of data which they hold about the customers and their transactions. Based on the master data and the transactional data which is available in their database, the banking companies can take informed decisions about the potential customers for the home loans. (C.Murthy & Mehta, 2011)

The new frame work which is proposed is a stage and phased data filtration model which could help the banks in analyzing the data effectively and on the basis of the information can identify the potential customers who could be resourceful to the organization in terms of effective solutions.

The illustrative instance of the proposed framework provides an overview on how effectively the proposed process flow of the framework which is based on Hadoop that could facilitate prospective customer identification process. (C.Murthy & Mehta, 2011)

In the instance of the proposed frame work,

The objective of the data processing is to identify potential customers for housing loans

The bankers can run the proposed Hadoop framework application to extract data pertaining to the salaried and self-employed customers. The frameworks applications codes run thru the entire database, divides the database in to various clusters and then process the data and extracts information and classify the customers.

In the next round of data processing, the banking companies could extract information on various inputs like the accounts which has the real time update of credit scores or the accounts which has availed other kinds of loans. Again the proposed code runs thru the system and the effective inputs shall be considered by the banks for further level of processing.

With this kind of structured breakdown and processing of data, the banking organizations could focus on developing list of potential customers who could be approached by the bank and promote their products and the services. The major advantage of the code run application system is that it facilitates the bankers in identifying its most prospective customer and plans its promotions according to the requirements. (Works)

Technical Work breakdown structure

Once the database is organized, irrespective of its size, the developed application code has the ability to run thru the entire database, split the data in to various clusters or blocks and then multiple nodes compile the query on to the cluster data and extract the information from the blocks of data each node has compiled. At the end, the data extracted from all the nodes are organized and is displayed for the benefit of the decision makers. From the output of the code, still the further levels of processing could even be carried out according to the needs of the customers, to ensure that they get to the point factors or the critical success factor information, depending upon the strategic approach which the company holds towards extracting the data. (Nemschoff, 2014)

The major advantage of such activity based approach is that processing such huge data which the organization holds gets easier and on the basis of the information systems and its inputs, it could be very resourceful for the organization to ensure that the project is successful and can lead to potential benefits for the organizations.

3.0       Outcome of the Solution

The proposed project has been more effective in terms of providing potential solutions. When the process code is tested on a stimulated environment, the outcome of the project is effective and the results have been insightful which could be helpful in gathering the data which could help in decision making for the banking organizations.

With few inputs changed in the structured query, the code can be used by the bankers to extract any kind of data on real time processing, without much disturbance to the functional process of the data that is being carried out in the organizations. Based on various inputs that have been collected from the queries, it is envisaged that the usage of Hadoop framework the database technology could provide good results to the organization as required. (Nemschoff, 2014)

4.0       Analysis and Learning Outcome

It is imperative from the development scenario that there is huge potential in terms of evaluating the process of implementing the Hadoop framework based query structure, that could process the scores of data and provide the requisite information based on the inputs that has been effective. In order to ensure that the process is effective the query has to be rightly defined and the outcome shall be more effective based on the inputs from the learning outcome.

In overall it can be stated that the results of the code running has been more effective and has provided an insight on how the Hadoop framework can be effective towards delivering the expected results that could facilitate the scenario.

5.0       Conclusion

Hadoop is a data processing framework that is based on the open source technology and is so effective in terms of data processing when there are huge scores of data that is available in the database. Based on the facets of information, the organizations can take the informed decisions and strategic approach towards new products, services or any other such critical decisions that the company could be adapting.

In this technical paper, the focuses is upon using the Hadoop framework and develop a code running process or algorithm that could help the retail bankers in using the existing data of the customers and identifying the potential customers, which helps the banks in having a focused approach towards the systems and process.

Based on the inputs from the code run in the simulated environment, it is evident that the Hadoop framework could be an effective platform for business analytics purpose and can help the organizations with the processed data that could help them in better decision making process.

Contact Information:


Academic Avenue

Email:  harinath.infotech@gmail.com

Skype Id: hari.reddyc

Phone: +91-9502542081(IND)  (Whats App, Viber)

phone: +1-2089086040 (US)

Thanks for visting www.AtoZGossips.com

Leave a Reply

Your email address will not be published. Required fields are marked *