Comparative Study of Machine Learning Algorithms for Rainfall Prediction

Majority of Indian framers depend on rainfall for agriculture. Thus, in an agricultural country like India, rainfall prediction becomes very important. Rainfall causes natural disasters like flood and drought, which are encountered by people across the globe every year. Rainfall prediction over drought regions has a great importance for countries like India whose economy is largely dependent on agriculture. A sufficient data length can play an important role in a proper estimation drought, leading to a better appraisal for drought risk reduction. Due to dynamic nature of atmosphere statistical techniques fail to provide good accuracy for rainfall prediction. So, we are going to use Machine Learning algorithms like Multiple Linear Regression, Random Forest Regressor and AdaBoost Regressor, where different models are going to be trained using training data set and tested using testing data set. The dataset which we have collected has the rainfall data from 1901-2015, where across the various drought affected states. Nonlinearity of rainfall data makes Machine Learning algorithms a better technique. Comparison of different approaches and algorithms will increase an accuracy rate of predicting rainfall over drought regions. We are going to use Python to code for algorithms. Intention of this project is to say, which algorithm can be used to predict rainfall, in order to increase the countries socioeconomic status.


INTRODUCTION
Precipitation expectation remains a genuine concern and has pulled in the consideration of governments, businesses, chance administration elements, just as mainstream researchers. Precipitation is a climatic factor that influences numerous human exercises like rural creation, development, control age, ranger service and the travel industry, among others. To this degree, precipitation forecast is fundamental since this variable is the one with the most astounding relationship with unfavorable normal occasions, for example, avalanches, flooding, mass developments and torrential slides. These episodes have influenced society for a considerable length of time. In this way, having a proper methodology for precipitation forecast makes it conceivable to take preventive and relief measures for these normal wonders.
Likewise these forecasts encourage the supervision of farming exercises, development, the travel industry, transport, and wellbeing, among others. For organizations in charge of catastrophe anticipation, giving exact meteorological forecasts can help basic leadership notwithstanding conceivable event of normal occasions.
Throughout the most recent couple of years, machine learning has been utilized as a fruitful instrument in regression for taking care of complex issues. Profound Learning is a general term used to allude to a progression of multilayer designs that are prepared utilizing unsupervised calculations. The primary improvement is learning a smaller, substantial, and non-straight portrayal of information by means ofunsupervised techniques, with the expectation that the new information portrayal adds to the forecast job needing to be done. This methodology has been effectively connected to fields like PC vision, picture acknowledgment, regular language handling, and bioinformatics. Profound learning has appeared for displaying time-arrangement information through systems like Restricted Boltzmann Machine (RBM), Conditional RBM, Autoencoder, Recurrent neural system, Convolution and pooling, Hidden Markov Model.
Precipitation forecast is useful to evade flood which spare lives and properties of people. In addition, it helps in overseeing assets of water. Data of precipitation in earlier causes ranchers to deal with their harvests better which result in development of nation's economy. Variance in precipitation timing and its amount makesprecipitation expectation a testing undertaking for meteorological researchers. In every one of the administrations given by meteorological office, Weather estimating emerges on top for every one of the nations over the globe. The assignment is intricate as it requires quantities of particular and furthermore all calls are made with no sureness. The diverse strategies utilized for precipitation expectation for climate determining with their restrictions. Different regression calculations which are utilized for expectation are talked about with their means in detail.

II.
RELATED WORK P. Goswami  In these months, there is assurance that precipitation occasions will be available.
The creators utilize the normal mugginess and normal breeze speed as logical factors. The investigations were done with three kinds of various systems: Feed Forward Back Propagation, Layer Recurrent, and Cascaded Feed Forward Back Propagation. At that point, the outcomes acquired with each system are looked at, finding that the kind of system that got the best outcomes was Feed Forward Back Propagation. Liu et al. propose an option over the past model. They investigate the utilization of hereditary calculations as an element choice calculation, a then Naive Bayes as the prescient calculation. The issue is deteriorated into two expectation issues: precipitation occasion (i.e., a double forecast issue), and a classification of precipitation on the off chance that that precipitation is available (i.e., light, moderate and solid precipitation). The appropriation of hereditary calculations for the determination of information sources, demonstrates that it is conceivable to lessen the multifaceted nature of the dataset acquiring comparable or marginally better execution. The information is utilized to foresee the climate change in the following 24 hours, given four factors: temperature, dew point, Mean Sea Level Pressures (MSLP) and wind speed. The outcomes acquired for creators demonstrate that the DNN give a decent component space to climate datasets and a potential device for the element combination of time arrangement issues. Anyway they don't anticipate with their model progressively troublesome climate information, for example, precipitation dataset.

III.
DATA AND METHODOLOGY The yearly precipitation information was gathered from India Water Portal. The factual relapse system is connected over this information so as to build up a model to foresee the yearly precipitation esteems. Relapse is a measurable strategy that utilizes the connection between at least two factors on observational database so as to anticipate a result from different factors. There are numerous sorts of relapse examination out of which straight relapse is especially connected as it is straightforward. The quantitative factors are thought to be straightly identified with each other. There are essentially two sorts of straight relapse for example straightforward direct relapse and numerous direct relapse. The numerous straight relapse that is utilized in this investigation

IV.
ALGORITHM: MULTIPLE LINEAR REGRESSIONS Massie and Rose (1997) connected a straightforward direct relapse technique to anticipate every day most extreme temperatures examined at Nashville, Tennessee.
Relapse endeavors, to decide the quality of the connection between one ward variable generally meant by Y and a progression of other changing factors known as autonomous factors. In straightforward relapse there are just two factors where one is the reliant variable and other is the free factor and the connection among them is of kind as beneath. This is known as the deterministic model Y=A+BX The exploration procedure has been set to achieve the target of this examination, which spoken to by setting up another group model of numerous AI procedures for precipitation forecast. To do as such, a explore structure which contains a few stages. The primary stage which is the dataset stage is utilized to recognize the information inspected in this examination by representing its source, subtleties and amount. The second stage which is preprocessing readies the information for preparing. The stage incorporates two undertakings; cleaning which will deal with the missing qualities and standardization which plans to restrain the incentive into explicit range. Third is build up similar investigation among the five procedures so as to distinguish the best Regression Algorithm Multiple linear Regression, Random Forest (RF) and AdaBoost.

ADABOOST
Boosting is a general outfit technique that makes a solid classifier from various frail classifiers. This is finished by structure a model from the preparation information, at that point making a second model that endeavors to address the mistakes from the main model. Models are included until the preparation set is anticipated flawlessly or a most extreme number of models are included.
AdaBoost was the first extremely effective boosting calculation created for paired arrangement. It is the best beginning stage for comprehension boosting.

PROPOSED ARCHITECTURE
In this area, we portray the general engineering of our proposed model. As referenced all through the paper, we utilize a profound learning design to foresee the gathered precipitation for the following day. The design is made out of two systems: an autoencoder organize and a multilayer perceptron arrange. The autoencoder arrange is dependable to highlight choice and as referenced. The autoencoder is a profound learning strategy guarantee for the element treatment in time arrangement. A multilayer perceptron organize is in charge of order, forecast undertaking. Next we will detail each system. The principal component in our engineering is the autoencoder. An autoencoder is an unsupervised system that expects to remove non-direct highlights for an information input. Being increasingly explicit, an autoencoder is made by three layers: the info layer, a concealed layer utilizing the sigmoid initiation work, and the yield layer. Diversely to great neural systems, auto encoders are prepared with the goal that the yield layer endeavors to be as comparative as conceivable to the info layer. Along these lines, the shrouded layer results in a nonstraight reduced portrayal of the information layer, accomplished gratitude to the sigmoid enactment work. The method of reasoning behind this change is that information will be increasingly minimized (i.e., less inclined to over fitting) and ideally some fascinating non-direct connections that improve the clarification of the yield variable have been found. In our design, the kind of auto encoder that we utilized is a demising auto encoder given by Thaana, a Python GPU-based library for scientific improvement. The concealed layer of the auto encoder, the non-straight minimal portrayal of the first info, is specifically associated with a Multilayer observation. This system is the one in charge of making expectations in our concern, by accepting the new issue portrayal as info. The MLP comprises of one concealed layer and uses the sigmoid enactment work. For the most part, in the different direct relapse models, the expectations may get one-sided because of the way that, progressively number of indicators causes over fitting of the model. Clearly whether the quantity of indicators expands, the precision of the model likewise increments. So each time another indicator will be presented, it might add to a superior understanding of model with watched esteems, regardless of whether because of chance alone. Besides, more indicators with higher arranged polynomials lead to formation of arbitrary clamors, which additionally creates high estimations of R2, which can be deluding. To battle such issues, the Adjusted R2 is considered which gets balanced as per the quantity of indicator factors. The estimation of Adjusted R2 increments when another indicator improves the model more than would be normal by possibility and diminishes if an indicator improves the model not as much as what is normal by chance. The intriguing thing to note is that the estimation of Adjusted R2 can even be negative, yet for the most part it doesn't happen so. Its esteem will be constantly lesser than R2. For a decent model, the distinction among R2 and Adjusted R2 is little.

VI.
CONCLUSION It is essential to evaluate precipitation appropriately for an improved water assets arranging, advancement and the executives. A different direct relapse demonstrate was created to appraise the yearly precipitation over India, utilizing the yearly precipitation estimations of three earlier years. The model can create great outcome and conveyed a magnificent coordinating with the genuine information in this manner acquiring a high coefficient of assurance (R2) equivalent to0.974 and a balanced R2 of 0.963. Such a high R2 esteem is sufficient to legitimize the ability of the model to assess yearly precipitation over the territory that may help for further hydro meteorological examinations in future.0.974 and a balanced R2 of 0.963. Such a high R2 esteem is sufficient to legitimize the ability of the model to assess yearly precipitation over the zone that may help for further hydro meteorological examinations in future. This paper introduced survey of various techniques utilized for precipitation expectation and issues one may experience while applying distinctive methodologies for precipitation estimating. Because of nonlinear connections in precipitation information and capacity of gaining from the past makes multiple linear regression is the best methodology from every single accessible methodologies. The findings from this study offer a few commitments to the present writing. First, it was shown that the use of machine learning techniques shows a good to significant improvement in rainfall prediction models study area. It is important to note that in general, the multiple linear regression method consistently outperforms the Random Forest (RF) and Adaboost algorithm. Hopefully, the outcomes from this study may help on addressing a suitable machine learning technique that has a significant impact on improving the performance of rainfall forecasting prediction.