-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix for Complex JSON Array Unexpected Behavior #90
base: master
Are you sure you want to change the base?
Conversation
Update iuabhmalat/jsonexport fork with changes from kaue/jsonexport
Link to example of current results relevant to claim in description (UPDATE: note to self that issuer has provided updated link example in next comment) |
Apologies, I should have wrapped the object with brackets in my PR (updated). You will see the behavior then. |
Acknowledged. Due to complexity it will take me another round of reviewing and examining. It would behoove you to attempt a most basic summation if possible for me. It’s taking me a lot of energy to compare results and dissect just exactly is the all encompassing view of this work. I believe you may have something here. Just trying so extra careful to make sure we have full understanding and solid documentation. If week or more goes by, please feel free to ping a comment again as my plate if fairly full |
Sure thing, here is a breakdown of my changes. Apologies for the long write up, I just got diagnosed The ProblemAs stated in my initial write up, I noticed that as the results within a complex JSON array were being The table copied from above demonstrates the logic flow.
The issue noted in step 5 is due to the current code only checking if the header index is not undefined. //
// Line 96 in lib/parser/csv.js
//
let elementHeaderIndex = getHeaderIndex(element.item);
if (currentRow[elementHeaderIndex] != undefined) {
... This returns false in the above example, where we are attempting to insert a display name. Therefore the My Proposed SolutionMy proposed solution is to adjust the check for a new row to be whether the column at the current row is This, of course, needs to rely on the assumption that the data coming in is in an order sorted according to The goal with the changes there is to return the same array of Ex: [
{
"a": {
"b": true,
"c": [{
"d": 1
},
{
"d": 2
},
{
"d": 3
},
{
"d": 4
}
],
"e": [{
"f": 1
},
{
"f": 2
}
]
}
}
] Becomes: [
{ item: "a.b", value: true },
{ item: "a.c.d", value: 1 },
{ item: "a.c.d", value: 2 },
{ item: "a.c.d", value: 3 },
{ item: "a.c.d", value: 4 },
{ item: "a.e.f", value: 1 },
{ item: "a.e.f", value: 2 },
] While this was already the case for "perfect" JSON schemas, if there were keys out of order or there were Therefore, changes had to be made in two files:
The Implementationlib/parser/handler.jsconstructor(options) {
...
+ this._headers = []
} We need to track headers for this JSON object on a class level. _handleArray(array) {
...
+ const getHeaderIndex = function(item) {
+ let index = self._headers.indexOf(item);
+ if (index === -1) {
+ if (item === null) {
+ self._headers.unshift(item);
+ } else {
+ self._headers.push(item);
+ }
+ index = self._headers.indexOf(item);
+ }
+ return index
+ }
+ const sortByHeaders = function(itemA, itemB) {
+ return getHeaderIndex(itemA.item) - getHeaderIndex(itemB.item);
+ }
...
} As we are going to be sorting by the headers stored on the class level, we need a Now, the This gets tricky because we cannot just sort these items at the end, as that will end up with the results Example: [
{
c: [
{
a: "Name 1",
b: "Field 1"
},
{
a: "Name 2",
b: "Field 2"
}
]
}
] Would result in: [
{ item: "c.a", "Name 1" },
{ item: "c.a", "Name 2" },
{ item: "c.b", "Field 1" },
{ item: "c.b", "Field 2" },
] Therefore, we have to sort only the results at the depth they are at, which gets complex because + for (let bIndex=0; bIndex < resultCheckType.length; bIndex++) {
+ getHeaderIndex(resultCheckType[bIndex].item);
+ resultCheckType[bIndex]._depth = (resultCheckType[bIndex]._depth || 0) + 1 While I am iterating within the above loop, I check for the current items depth. If it is + if (resultCheckType[bIndex]._depth === 1) {
+ toSort.push(resultCheckType[bIndex]);
+ } else if (toSort.length > 0) {
+ const sorted = toSort.sort(sortByHeaders)
+ for (let cIndex = 0; cIndex < sorted.length; cIndex++) {
+ resultCheckType[bIndex - sorted.length + cIndex] =
+ sorted[cIndex];
+ }
+ toSort = []
+ } Then the same code is executed again at the end of the for loop, just in case we had any items left I've put some output on how the result array looks during this recursive process at this gist. The end result is an array of items and values that are in a proper order according to the JSON's schema! lib/parser/csv.js_parseArray(json, stream) {
...
+ let normalizedHeaders = []
...
+ let getNormalizedIndex = function(header) {
+ var index = normalizedHeaders.indexOf(header)
+ if (index === -1) {
+ normalizedHeaders.push(header)
+ index = normalizedHeaders.indexOf(header)
+ }
+ return index
+ }
...
} The first changes are relatively self explanatory. We can't use the existing fillRows = function(result) {
...
+ let lastIndex = -1
...
for (let element of result) {
let elementHeaderIndex = getHeaderIndex(element.item);
+ let normalizedIndex = getNormalizedIndex(element.item)
+ if (
+ currentRow[elementHeaderIndex] != undefined ||
+ normalizedIndex < lastIndex
+ ) {
fillAndPush(currentRow);
currentRow = newRow();
}
emptyRowIndexByHeader[elementHeaderIndex] = emptyRowIndexByHeader[elementHeaderIndex] || 0;
+ lastIndex = normalizedIndex;
...
}
...
} And finally, here we have the check for the normalized header index against the most recently placed noramlized ConclusionWhile my proposed solution solves for my exact problem, I'm very new to this library and have not touched |
@AckerApple when you have time please follow up on that last round of review. @hdwatts Thanks for this contribution, it's really well documented 😄 I feel sorry that It's taking me so much time to reply, I hoping that next year I will be able to be more active in this community. |
@kaue you bet friend, I will do a review within a max 4 days. Fairly loaded but I know you just asking for an eye ball review. I will respond in time |
@kaue @AckerApple Thanks for checking back in on this, guys. I'll note that we've been using these changes on our production platform to export results from our GraphQL API over the past year and have yet to encounter any issues. |
I would like to use the fixes proposed by @hdwatts... any chance you'll merge it? EDIT: I've spoken without fully testing. Now that I have, I get a number of new rows with mostly empty cells for nested objects... Not sure why... |
@loganpowell i will try to find some time to review this PR but if you can please try using @hdwatts branch and see if the fix is working properly. sorry for the delay guys, busy times :/ |
Status
READY
Description
I had been dealing with issues similar to Issue #78 - and pulled in PR #81 to help resolve those issues. However, I found a few further exceptions with complex array structures and schemas. I believe this PR is an alternative to PR #81, although their PR mentions
fillGaps
which I don't think I address here. Unfortunately there are no tests in that PR so I do not know if this addresses thefillGaps
issue mentioned.The method used here is to keep a stricter track of the csv headers, and ensure that we are only ever adding to a line if the current element is supposed to come after the previous element.
Old Logic
null
, falsely applying investment "C" to company "B"New Logic
The other exceptions are noted in the tests - please let me know if you have any questions.
Related PRs
List related PRs against other branches:
Todos
Steps to Test or Reproduce
Outline the steps to test or reproduce the PR here.
Tests have been added.
npm run test
Impacted Areas in Application
List general components of the application that this PR will affect: