DATA STREAMING IN MULESOFT 4

DATA STREAMING IN MULE 4 

- Bharath Venkata

 This blog post mainly focus on sorting out the Huge Data transfer issues in mule 4 with the concept of Streaming .

Main problem with huge data load transfer is Java Heap out of memory and the CPU performance , To address this problem MuleSoft has a beautiful concept named STREAMING ✊


Activity :-  Lift and Shift 2 M+ Records from CSV file into Postgres DB using Streaming concept.


Step 1 : To load the 2M+ records to database we need a huge sample data were we can find it here in the below mentioned link  

Download Sample CSV Files for free - Datablist


Step 2 :  We need to load this sample CSV file which holds 2 M records to Database using mulesoft Streaming concept 


Mule flow for transferring 2M + CSV data to Database 


File Read Config : 
Enable Streaming in MIME TYPE section as Shown below







Dataweave Streaming : 
DataWeave supports end-to-end streaming through a flow in a Mule application. Streaming speeds the processing of large documents without overloading memory.


To know more about the data streaming concept in dataweave , Here is the Link :

Dataweave Snippet for data Streaming:



Time for Database Insert 

Here is the For each block config which has a batch size of 3000 

Here is the bulk insert query 
Insert into local.people 
(Index,User_Id,First_Name,Last_Name,Sex,Email,Phone,dob,JobTitle) values
(:Index,:User_Id,:First_Name,:Last_Name,:Sex,:Email,:Phone,:dob,:JobTitle)




We are not done yet ..😅
Database Table creation  : 
Create a table in your local database with the below query

 CREATE TABLE local.people (
 Index VARCHAR PRIMARY KEY,
 User_Id VARCHAR ,
 First_Name VARCHAR,
 Last_Name VARCHAR,
 Sex VARCHAR,
 Email VARCHAR,
 Phone VARCHAR,
 dob VARCHAR,
JobTitle VARCHAR
);


Eyyyyyy we are good to go....Let do a run and see 💪

Before Run 
Database records count 
After Run 
Database records count

Note : Data will be loading to database as chunks while loading you can check the count at that particular time with the query 

select count(*) from local.people


A hazel free transfer of 2M records is done very smoothly with out effecting the performance 💁

Load Time : 5 mins ✌



Comments

  1. Good blog venkat. Can you advise how much vcore consumed by the app for this task? Thanks

    ReplyDelete
  2. 0.1 * 2 is enough with 1 worker each .... It won't cost any cpu as it is synchronised process... It will be like after one chunk of data consumption is done.. it will go for another fetch..and so on..

    ReplyDelete
  3. in this POC, which database connection was used?

    ReplyDelete

Post a Comment