ProgrammingNode JS

Importing CSV Data Into MySQL

Reading a CSV file and importing it into a MySQL Database With NodeJS

We’re going to be looking at how to read a CSV file and how to dump that data into a MySQL database.

Setting Everything Up

Lets go ahead and make sure we have these dependencies installed and imported

const fs = require('fs');
const mysql = require('mysql');
const csv = require('fast-csv');

Loading The CSV File

Next lets go ahead and create a read the data into an array

let stream = fs.createReadStream("my_file.csv");

Now we have our stream, lets use the CSV module and parse it. Note we listen for two events (data, end). The data event is when a new line of the CSV has been parsed, and the end event is once the file has been completely read.

let myArray = [];
let csvStream = csv
	.parse()

	.on("data", (data) => {
		myArray.push(data);
	})

	.on("end", () => {
			
	});

stream.pipe(csvStream);

For those of you who have already looked at a CSV file in a text editor, you may already be aware that the first line of the file is the column headers. We want to get rid of those, so lets just shift the array. This removes the first element of the array.

.on("end", function () {
    myArray.shift();
}

Inserting Multiple Records

I have already covered how to insert multiple records easily in my post Insert Multiple Records Into MySQL. It may be worth you reading that before getting started on inserting the data into a database..

On the event event, we’re going to just open up a new database connection, and bulk insert the records into the database using the MySQL’s native bulk insert functionality.

.on("end", function () {
	// remove the first element in the array
	myArray.shift();
	
	// create a new connection to the database
	const connection = mysql.createConnection({
		host: 'localhost',
		user: 'username',
		password: 'super_secure_password',
		database: 'database_name'
	});

	// open the connection
	connection.connect((error) => {
		if (error) {
			console.error(error);
		} else {
			let query: string = 'INSERT INTO table (field1, field2, field3, field4) VALUES ?';
    		connection.query(query, [myArray], (error, response) => {
				if(error) {
					console.log(error);
				} else {
					console.log(response); // x records inserted
    		});
		}
	});
}

Known Bugs

Packet Too Large

There’s a few issues I’ve had when implementing this. As discussed in the Insert Multiple Records Into MySQL, having a large CSV file can lead to the database throwing an error due to a large packet file exceeding the ‘maximum_allowed_packet’ variable. You can see an easy fix below for that.

ER_NET_PACKET_TOO_LARGE

I was trying to bulk insert around 180k records from a CSV file, and I kept getting an error. When I traced the error back to MySQL output in the console, I could see that my MySQL server was running with a fairly small ‘maximum_allowed_packet’.

In order to check what packet size you have set currently, just run the following command, and the one underneath it to update the variable.

show variables like 'max_allowed_packet';
set global max_allowed_packet=33554432;

Incorrect DateTime For MySQL

Something else I had to deal with, is the date field in the CSV did not match the DateTime field in the MySQL database. In order to solve this, I just looped through the array and replaced correct index of the element with a converted value. Something along the lines of this:

.on("end", function () {
	myArray.shift();
	for (let i = 0, len = myArray.length; i < len; i++) {
		myArray[i][3] = convertToMysql(myArray[i][3]);
	}
	// ... 

The ‘convertToMySQL on line 4 is a custom function I wrote. You will have to write your own as your input may differ from mine.

Side Note: If you don’t want to shift the array to remove the first index, you can pass an options object to the CSV Stream, see below

csv
    .fromStream(stream, {headers : true})

Errors In The CSV

This one is kind of situational, but you can listen to two events that may help you here. One is the ‘error’ event, and the other is the ‘data-invalid’ event. The error event gets triggered when an error occurs such as the file not found, or not being CSV etc. The data-invalid will be called when a row has invalid data on it.

.on('error', (error) {
	// error event handler
});

.on('data-invalid', (data) => {
	// data was invalid
});

Side Note: “Any rows consisting of nothing but empty strings and/or commas will be skipped, without emitting a ‘data’ or ‘error’ event.”

Whole Thing

If you want to double check your code after reading through; it might look something like below.

const fs = require('fs');
const mysql = require('mysql');
const csv = require('fast-csv');

let stream = fs.createReadStream("path_to_my_file.csv");
let myData = [];
let csvStream = csv
    .parse()
    .on("data", function (data) {
        myData.push(data);
    })
    .on("end", function () {
		myData.shift();
		
		// create a new connection to the database
		const connection = mysql.createConnection({
			host: 'localhost',
			user: 'username',
			password: 'super_secure_password',
			database: 'database_name'
		});

        // open the connection
		connection.connect((error) => {
			if (error) {
				console.error(error);
			} else {
				let query = 'INSERT INTO table (field1, field2, field3, fieldn) VALUES ?';
				connection.query(query, [myData], (error, response) => {
					console.log(error || response);
				});
			}
		});
   	});

stream.pipe(csvStream);

And that just about does it for this. I hope you guys found it useful! If you have any issues or suggestions, please don’t hesitate to comment!

Tags

Nicholas Mordecai

Just your friendly neighbourhood programmer!

Related Articles

12 Comments

    1. Yes, but only when the CSV file is held in memory. Once that process has completed and exited, or you have disposed of the file then your CSV will no longer take up memory. You can also chunk the logic if you want to minimise (say 100k rows at a time)

  1. have you encountered this error :

    [367116:000001A6A4B1B6D0] 27189 ms: Mark-sweep 1402.6 (1429.3) -> 1402.2 (1431.8) MB, 1575.4 / 0.0 ms (average mu = 0.091, current mu = 0.003)
    allocation failure scavenge might not succeed
    [367116:000001A6A4B1B6D0] 29322 ms: Mark-sweep 1404.7 (1431.8) -> 1404.4 (1433.8) MB, 2130.0 / 0.0 ms (average mu = 0.041, current mu = 0.002)
    allocation failure scavenge might not succeed

    ==== JS stack trace =========================================

    0: ExitFrame [pc: 000002AFB135C5C1]
    1: StubFrame [pc: 000002AFB135D9BF]
    Security context: 0x0290a8a1e6e1
    2: arrayToList [000002712D755719] [C:\Users\shaunpow\Documents\BitBucketLocalRepo\Testing-space\node_modules\sqlstring\lib\SqlString.js:~60] [pc=000002AFB160F9E8](this=0x02288173ff71 ,array=0x01c18c340909 ,timeZone=0x02712d756161 )
    3: arrayToList [0…

    FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed – JavaScript heap out of memory
    1: 00007FF7BEAE0EFA v8::internal::GCIdleTimeHandler::GCIdleTimeHandler+4810
    2: 00007FF7BEABA296 node::MakeCallback+4518

  2. Hi,
    I tried this code, but I found this error:”UnhandledPromiseRejectionWarning: RequestError: Incorrect syntax near ‘?'”.
    How to solve this issue?.

  3. const express = require(‘express’);
    const router = express.Router({
    mergeParams: true
    });
    const multer = require(‘multer’);
    const fs = require(‘fs’);
    const mysql = require(‘mysql’);
    const csv = require(‘fast-csv’);

    // SET STORAGE
    var storage = multer.diskStorage({
    destination: function (req, file, cb) {
    cb(null, ‘uploads’)
    },
    filename: function (req, file, cb) {
    cb(null, file.fieldname + ‘-‘ + Date.now())
    }
    })

    var upload = multer({ storage: storage })

    router.post(‘/uploadfile’, upload.single(‘myFile’), (req, res, next) => {
    const file = req.file
    if (!file) {
    const error = new Error(‘Please upload a file’)
    error.httpStatusCode = 400
    return next(error)
    }

    let stream = fs.createReadStream(file.originalname);
    let myData = [];
    let csvStream = csv
    .parse()
    .on(“data”, function (data) {
    myData.push(data);
    })
    .on(“end”, function () {
    myData.shift();

    // create a new connection to the database
    const connection = mysql.createConnection({
    host: ‘localhost’,
    user: ‘root’,
    password: ”,
    database: ‘dbname’
    });

    // open the connection
    connection.connect((error) => {
    if (error) {
    console.error(error);
    } else {
    let query = ‘INSERT INTO tbl_bank_statement (id,transaction_id,dr_amount,cr_amount,final_balance,created_date,updated_date) VALUES ?’;
    connection.query(query, [myData], (error, response) => {
    console.log(error || response);
    });
    }
    });
    });

    stream.pipe(csvStream);
    })

    module.exports = router;

    Error :

    events.js:288
    throw er; // Unhandled ‘error’ event
    ^

    Error: ENOENT: no such file or directory, open ‘bank.csv’
    Emitted ‘error’ event on ReadStream instance at:
    at internal/fs/streams.js:132:12
    at FSReqCallback.oncomplete (fs.js:154:23) {
    errno: -2,
    code: ‘ENOENT’,
    syscall: ‘open’,
    path: ‘bank.csv’
    }

    file is properly uploaded into the folder but inside a database isn’t…please help

    1. Hey, it looks like you’re trying to load a file that doesn’t exist. Can you try using __dirname or saving to a static directory? If that doesn’t fix it, then it may be an async issue where the IO is taking N time to save the file to disk, but the csv parser tries to read a file that hasn’t finished saving yet.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Close
Close