Returns of observations with a unique timestamp are written to Cloud Storage. It allows reducing egress and ensures that BigQuery does not receive duplicate data. During the first run of the script, all observations starting from 2017 will make it to BigQuery.
Subsequent runs provide incremental observations of the ”unseen” data. In the final stage of ETL, the processed data is written to BigQuery. Aggregating data in BigQuery allows other services to retrieve the data in a cost-effective way.
Investors risk preferences
The investor risk preferences (IRP) are a synthetic dataset containing historical records of thousands of existing retail investors. This dataset is a crucial component for making personalized recommendations based on an individual’s investment preferences.
The risk aversion is a target variable of interest. Average monthly income, education, loans, and deposits are among 15 independent variables. Investors’ attributes are generated using different continuous variable distribution functions: Gamma, Gumbel, Gaussian, R-distributed, and others. A script produces monthly snapshots of investors’ attributes, resulting in 48,000 data points.
The Cloud Function triggers a generation of the dataset upon the first launch of IPRE. Dataflow migrates the generated dataset from Cloud Storage to BigQuery.
Machine learning advanced analytics
The machine learning (ML) workflow is as follows:
Raw data is preprocessed and uploaded to GCS. A Dataflow job is registered through Google Composer.
Processed data is uploaded to BigQuery with predefined data schema and data format.
By the Pub/Sub trigger, training of AutoML and ARIMA models is triggered. The training is performed with the use of integrated BigQuery ML tools.
When the training has completed, the system triggers the inference process.
Individual risk preferences and ticker’s prices are predicted by taking the uploaded BigQuery data as an input.
Predicted results are saved to Cloud Storage to cache the results and make the data reusable.
Results are published through the recommendation engine, which is deployed on Cloud Run, and prediction results are sent to the end user.
The workflow is shown in Figure 2.