Import CSV file and query it with LINQ


How to easily import CSV file and query it with LINQ Assume that you have an plain text, old Comma Separated Values file filled with your precious export from a legacy system. How can you process it easily now? The first answer that comes to mind is to parse it and load it into a datatable and later process it by using DataTable.Select() method. But this approach has some limitations – like splitting data into several tables and then join them.

One would imagine that parsing CSV files is a straightforward and boring task, given that it is quite a while since CSV is around. Some of them are correct – in the sense that many implementations merely use some splitting method like String.Split(). Some don’t even offer the specification of the values splitting character – so your file wouldn’t be parsed correctly if instead of , you have ; as separator – yet another thing to modify if you’re lucky enough to have the sources. Others will not handle properly field values with commas because the simple split method of the String class. But there are better implementations that take care about escaped quotes, trimming spaces before and after fields and other small and useful details, but very few that I found did it all as I liked it – and at least as importantly, in a fast and efficient manner.

After trying several methods to parse the csv file, I endup using Matt Perdecks LINQ to CSV library presented in its article on The Code Project website.

Among the feature that I used and liked very much I would mention (from the article):

  • Follows the most common rules for CSV files. Correctly handles data fields that contain commas and line breaks.
  • In addition to comma, most delimiting characters can be used, including tab for tab delimited fields.
  • Can be used with an IEnumerable of an anonymous class – which is often returned by a LINQ query.
  • Supports deferred reading.
  • Supports processing files with international date and number formats.
  • Supports different character encodings if you need them.
  • Recognizes a wide variety of date and number formats when reading files.
  • Provides fine control of date and number formats when writing files.
  • Robust error handling, allowing you to quickly find and fix problems in large input files.

Using the library is straight-forward: after downloading the zip file from the The Code Project (this required that you have an account there) you can start using it by including it as a referenced library in your project.

Next step is to define the layout of your csv file, in a manner that CSVtoLINQ to be able to map the data from file (which is pure text) to correct data types :

using LINQtoCSV;
using System;

class Product
{
    [CsvColumn(Name = "ProductName", FieldIndex = 1)]
    public string Name { get; set; }

    [CsvColumn(FieldIndex = 2, OutputFormat = "dd MMM HH:mm:ss")]
    public DateTime LaunchDate { get; set; }

    [CsvColumn(FieldIndex = 3, CanBeNull = false, OutputFormat = "C")]
    public decimal Price { get; set; }

    [CsvColumn(FieldIndex = 4)]
    public string Country { get; set; }

    [CsvColumn(FieldIndex = 5)]
    public string Description { get; set; }
}

As you can see, there are lots of types that you can apply to your class.

Then you have to declare the using clause in your source file, where you want to use it:

using LINQtoCSV;

Then, all you need is to start using the library by adding this code

private void btnLoad_Click(object sender, EventArgs e) {
  openFile.Filter = "CSV Files|*.csv";
  openFile.Title = "Open CSV File";

  if (openFile.ShowDialog() == DialogResult.OK) {
    textBox1.Text = openFile.FileName;

    CsvFileDescription inputFileDescription = new CsvFileDescription {
      // cool - I can specify my own separator!
      SeparatorChar = ';',
      FirstLineHasColumnNames = true
    };

    CsvContext cc = new CsvContext();

    bEvents =
      cc.Read(openFile.FileName, inputFileDescription);

    // elegantly parse the LINQ outputed by the parser
    var bALLEvents =
      from p in bEvents
      orderby p.EventID
      select new { p.EventID, p.EventName, p.SourceName, p.TargetName };

    // binding to DataGridView
    // Notice the conversion to a List of the bALLEvents collection
    dataGridView1.DataSource = bALLEvents.ToList();
  }
}

That’s it, folks – Enjoy and share!

Leave a comment

Your email address will not be published. Required fields are marked *