A colleague recently asked for my help regarding some data he had entered into a local MySql database. He needed to update some 60k records and had 4 separate CSV files containing the raw data. He wrote a (bad) perl script which was taking too long to convert these CSV files into simple update statements - he later fixed his script to make it execute a lot quicker.
His update statement was something like:
UPDATE table_name SET column1 = 'csv_value_4' WHERE column2 = 'csv_value_3' and column3 = 'csv_value_1';
The first thing I thought of was to use a regular expression and my favorite text editor to do a quick search and replace.
Now I know there's a lot of ways to do this, and there's a lot of regular expressions you could probably use.
Assuming the data is mostly clean, and you don't need to worry about commas in between quotations. You could search for the words between each comma which might look something like, "(\w+)," (I'm using the Java Regex format here btw).
However, I think a lot of people forget that regular expressions have the ^ (not) expression. With that in mind, and (again) not worrying about commas inside of quotations you can create this simple regex to find anything between commas, "([^,]*)," which basically says, group anything that's not a comma before a comma.
In his CSV there were 4 values, so my regex was, "([^,]*),([^,]*),([^,]*),([^,]*)\n" (he had some empty values).
My replacement expression was:
UPDATE table_name SET column1 = '$4' WHERE column2 = '$3' and column3 = '$1';\n
I just opened each of his CSV files in jEdit and did a search and replace - Power of Not.
Friday, April 18, 2008
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment