[okfn-labs] Python Iterator over table (csv) *columns*

Edgar Zanella Alvarenga e at vaz.io
Wed Dec 17 14:52:57 UTC 2014


You can use read_csv from Pandas:

http://pandas.pydata.org/pandas-docs/version/0.13.1/generated/pandas.io.parsers.read_csv.html

usecols : array-like

     Return a subset of the columns. Results in much faster parsing time 
and lower memory usage.

and pass the columns to the `usecols` argument. If you have a problem 
with the size of
the csv file you can read it in chunks with:

pandas.read_csv(filepath, sep = DELIMITER,skiprows = 
INITIAL_LINES_TO_SKIP, chunksize = 10000)

and change the value INITIAL_LINES_TO_SKIP in your iteration.

Edgar

On 17/12/2014 05:23, Paul Walsh wrote:
> Hi,
>
> Does anyone have or know of a nice (existing) solution for iterative
> reading of CSV/table data by *column*? It needs to be an iterator - I
> don’t want everything in memory.
> _______________________________________________
> okfn-labs mailing list
> okfn-labs at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/okfn-labs
> Unsubscribe: https://lists.okfn.org/mailman/options/okfn-labs




More information about the okfn-labs mailing list