I am using below referred code to edit a csv using Python. Functions called in the code form upper part of the code.
Problem: I want the below referred code to start editing the csv from 2nd row, I want it to exclude 1st row which contains headers. Right now it is applying the functions on 1st row only and my header row is getting changed.
I tried to solve this problem by initializing
row
variable to 1
but it didn't work.It may be your first line and not the subtract function. Try removing one extra slash from your hdfs path. I dont think thats the issue here. Subtract will works within two rdd's. So u should convert tagsheader to rdd by using parallelize.
Please help me in solving this issue.
Martijn Pieters♦
user1915050
3 Answers
Your
reader
variable is an iterable, by looping over it you retrieve the rows.To make it skip one item before your loop, simply call
next(reader, None)
and ignore the return value.You can also simplify your code a little; use the opened files as context managers to have them closed automatically:
If you wanted to write the header to the output file unprocessed, that's easy too, pass the output of
next()
to writer.writerow()
:Martijn Pieters♦Martijn Pieters
Another way of solving this is to use the DictReader class, which 'skips' the header row and uses it to allowed named indexing.
Given 'foo.csv' as follows:
Use DictReader like this:
Chad ZawistowskiChad Zawistowski
Doing
row=1
won't change anything, because you'll just overwrite that with the results of the loop.You want to do
next(reader)
to skip one row.KatrielKatriel
protected by Vamsi PrabhalaNov 5 '18 at 1:57
Thank you for your interest in this question. Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).
Would you like to answer one of these unanswered questions instead?
Would you like to answer one of these unanswered questions instead?