-
Notifications
You must be signed in to change notification settings - Fork 146
No array column but get "Array index out of range: 1048576" #378
Comments
Good search, can you try to modify the compression codes from:
to
I think this may be the reason. |
Thanks for the suggestion. I've thought about this as well. The problem is, based on the comments of the function /**
* Compresses <code>src[srcOff:srcOff+srcLen]</code> into
* <code>dest[destOff:destOff+maxDestLen]</code> and returns the compressed
* length.
*
* This method will throw a {@link LZ4Exception} if this compressor is unable
* to compress the input into less than <code>maxDestLen</code> bytes. To
* prevent this exception to be thrown, you should make sure that
* <code>maxDestLen >= maxCompressedLength(srcLen)</code>.
*
* @param src the source data
* @param srcOff the start offset in src
* @param srcLen the number of bytes to compress
* @param dest the destination buffer
* @param destOff the start offset in dest
* @param maxDestLen the maximum number of bytes to write in dest
* @throws LZ4Exception if maxDestLen is too small
* @return the compressed size
*/
public abstract int compress(byte[] src, int srcOff, int srcLen, byte[] dest, int destOff, int maxDestLen); If the exception is caused by last parameter(aka. In CompressedBuffedWriter.java, we can find So I think if |
Yes, surely From the comment: param maxDestLen the maximum number of bytes to write in dest, but what if the maxDestLen is calculated from the destOff ? Now we give a too large value to |
Good point, let me give it a try |
Oh, I think that's the reason: |
Re-ran the pipeline for the whole night with the new-built Jar, the problem seems to persist. In fact, I found in the code of the latest release, it used int res = lz4Compressor.compress(writtenBuf, 0, position, compressedBuffer, 9 + 16); Based on LZ4Compressor.java, it is an overloading method of public final int compress(byte[] src, int srcOff, int srcLen, byte[] dest, int destOff) {
return compress(src, srcOff, srcLen, dest, destOff, dest.length - destOff);
} So it's same as int res = lz4Compressor.compress(writtenBuf, 0, position, compressedBuffer, 9 + 16, compressedBuffer.length - (9 + 16)); Since But still, we didn't find the root cause of "Array index out of range: 1048576" |
I may find the reason. In the code of realising public final int compress(byte[] src, int srcOff, int srcLen, byte[] dest,
int destOff, int maxDestLen) {
checkRange(src, srcOff, srcLen);
checkRange(dest, destOff, maxDestLen);
... Considering public static void checkRange(byte[] buf, int off) {
if (off < 0 || off >= buf.length) {
throw new ArrayIndexOutOfBoundsException(off);
}
}
public static void checkRange(byte[] buf, int off, int len) {
checkLength(len);
if (len > 0) {
checkRange(buf, off);
checkRange(buf, off + len - 1);
}
} So this problem could also be caused by the array The variable public BinarySerializer(BuffedWriter writer, boolean enableCompress) {
this.enableCompress = enableCompress;
BuffedWriter compressBuffer = null;
if (enableCompress) {
compressBuffer = new CompressedBuffedWriter(ClickHouseDefines.SOCKET_SEND_BUFFER_BYTES, writer);
}
either = new Either<>(writer, compressBuffer);
} The I'll try again with a bigger value for |
Seems the position is 1048577, but I did not find any reason to see why.
You can modify the code, print logs if the position is larger than 1048576. |
Environment
Error logs
Steps to reproduce
Using spark to ingest a large dataframe(> 100M rows) with many columns(>5000) into Clickhouse.
Here is the code I use:
Other descriptions
Based on this issue, this could be caused by inserting array column. But the dataframe I inserted contains only StringType, timestamp and LongType.
I've also tried to do some investigation on these source codes:
Unfortunately, I still couldn't find the cause.
The text was updated successfully, but these errors were encountered: