Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added ignoreQuoteInToken support to ignore quotes in strings #46

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ranjithrp
Copy link

added ignoreQuoteInToken support to ignore quotes in strings even when there are few encapsulatedTokens with comma within. This will help in parsing csv values like
abc,"xyz" 123 bar,3,11961034,"First author, Second Author"

…n there are few encapsulatedTokens with comma within
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.2%) to 92.618% when pulling ad01ee1 on ranjithrp:master into 7754cd4 on apache:master.

@coveralls
Copy link

coveralls commented Jun 21, 2019

Coverage Status

Coverage increased (+0.08%) to 92.913% when pulling a381d53 on ranjithrp:master into 7754cd4 on apache:master.

@garydgregory
Copy link
Member

@ranjithrp ,
You'll need to provide unit tests so we can clearly assess what it is you are trying to achieve.
Thank you,
Gary

@ranjithrp
Copy link
Author

@ranjithrp ,
You'll need to provide unit tests so we can clearly assess what it is you are trying to achieve.
Thank you,
Gary

i have added the junits.

@garydgregory
Copy link
Member

Would you mind showing an example with actual expected rows please? It is not clear to me yet if this is a good thing.

@ranjithrp
Copy link
Author

ranjithrp commented Jun 22, 2019

Would you mind showing an example with actual expected rows please? It is not clear to me yet if this is a good thing.

In the actual expected row, we have one column which has value with quotes in it, and other column which has a comma in it.
Ex row -
abc,"xyz" 123 bar,3,11961034,"First author, Second Author"

Here the second column has value "xyz" 123 bar. This has quotes in the token
The 5th column has value First author, Second Author. This has comma in the token

if i use withQuote(null), it ignores the quote for the fifth column and then splits that value to 2.
If i dont use withQuote(null), for the second column, the lexer tries to parseEncapsulatedToken and when it sees character other than delimiter or newline, it throws exception.

This was the problem we were facing, and to handle this, i had made the above change.

@ranjithrp
Copy link
Author

@garydgregory Hope the above explanation clarifies your question. Please let me know if you need any additional details. Thanks

@garydgregory
Copy link
Member

@ranjithrp
I would prefer to see a unit test that that compares your example input with an actual parsed row where each column value is asserted for its correctness.
Thank you,
Gary

@ranjithrp
Copy link
Author

@ranjithrp
I would prefer to see a unit test that that compares your example input with an actual parsed row where each column value is asserted for its correctness.
Thank you,
Gary

@garydgregory i have tried to do the same in src/test/java/org/apache/commons/csv/LexerTest.java. In one of the test case, i have set the boolean to true and shown how it is able to retrieve each token and have asserted the values also.
In another test case, i have set the boolean to false and shown how the parser throws an exception while parsing a token with quotes within.
Please let me know if you are looking for anything specific apart from this?

@garydgregory
Copy link
Member

garydgregory commented Jul 5, 2021

@ranjithrp

abc,"xyz" 123 bar,3,11961034,"First author, Second Author"

Please document in the PR description the expected column values. It's not clear to me how malformed the input is and what you are exactly trying to work around. Is this really about dealing with malformed input. If the input was properly escaped or formatted, could the input be processed properly?
TY!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants