Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Lol. Tell me you never had to parse CSV files without telling me.

CSV files can be a nightmare to work with depending where they come from and various liberties that were taken when generating the file or reading the file.

Use a goddam battle tested library people and don't reinvent the wheel. /oldman rant over



Yes, you eventually realize the hard way that "CSV" is actually a blanket of various similar formats, with different rules and conventions. The way one program outputs CSV's may be completely different from another.


Actually, I worked on OpenAddresses, a project to parse thousands of CSV files containing address data from many different county, state, and national systems around the world. It really wasn't that hard, even Python's basic csv parser was sufficient to the task (and there are plenty of better options).

CSV is remarkably robust in practice.


Hey, CSV is hard, guys.

I've found template injection in a CSV upload before because they didn't anticipate a doublequote being syntactically relevant or something.

It was my job to find these things and I still felt betrayed by a file format I didn't realize wasn't just comma separated values only.


Someone can find all these issues on 200 lines of code :) see sibling comment


This isn't my code, but gives you an idea of the level of complexity involved. But don't reimplement what you don't need to.

https://github.com/mafintosh/csv-parser/blob/master/index.js


To be clear my comment was meant as a joke.

Looking at the parser I see a few problems with it just by skimming the code. I'm not saying it wouldn't work or that it's not good enough for certain purposes.


Oh yeah? Such as? What purposes do you think it wouldn't be good for? The author will probably be interested in your feedback. Apparently it's getting over a million downloads a week on npm.


I have not used it, so this is mostly speculation but i would be curios around character set handling, mixed line ending handling, large file handling, invalid and almost valid file handling.

You can pick on some of the corner case issues here: https://github.com/mafintosh/csv-parser/issues Also look at ones that were solved. https://github.com/mafintosh/csv-parser/pulls

Some interesting ones: https://github.com/mafintosh/csv-parser/pull/121 https://github.com/mafintosh/csv-parser/pull/151 https://github.com/mafintosh/csv-parser/issues/218

The author of the library probably has learned, the hard way many many lessons (and probably also decided to prioritize some of the requested issues / feature requests along the way).

The above is not meant as a ding on the project itself and I am sure it is used successfully by many people. The point here is that your claim that you can easily write a csv parser in 200 lines of code does not hold water. It's anything but easy and you should use a battle tested library and not reinvent the wheel.


If you had read my original comment, you would see I didn't claim it's easy to do, only that it can be done in around 200 lines. That's clearly the case.

Character set handling isn't really an issue for JavaScript as strings are always utf-16. When a file is read into a string the runtime handles the needed conversion.

As for handling large files, I've used this with 50mb CSVs, which would need a 32bit integer to index. Is that large enough? It's not like windows notepad which can only read 64kb files.


Windows notepad can read multiple megabyte files. It can read files that are hundreds of megabytes. It's not pleasant, loading is incredibly slow, and resizing the window when reflow is enabled makes it take that much longer, but it's definitely possible.


My point was that it's not trivial and it's hard to get it right. The way I read your comment was that it's not hard and can easily be done in 200 lines. It's possible I misread it.

I think the original point I was making still stands.


It's not as bad as all that. There's some gochas sure but you can cover them all with about 200 lines of code.

However, I would recommend using a tested library to do the parsing, sqlite for example, rather than rolling your own. Unless you have to of course.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: