For Programmers: Free Programming Magazines  


Home > Archive > Java Help > March 2004 > Fastest way of combining byte arrays of different lengths?









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Fastest way of combining byte arrays of different lengths?
Princess Morgiah

2004-03-27, 12:30 am

Hi everyone,

What is the fastest way of combining x byte arrays of variable length into a
single new byte array?

I'm trying to download chunks off of a Socket, and return them as a single
byte array. At the moment I store all the chunks in a Vector, calculate the
total length, reserve that amount of bytes in a new buffer and fill the
buffer.

It works, but it's only a temporary solution - not clean, and especially not
fast (+100ms for a few 100k bytes).

Can somebody recommend me a faster way of doing this?

Thanks in advance,

Princess Morgiah


Jon A. Cruz

2004-03-27, 12:30 am

Princess Morgiah wrote:
> I'm trying to download chunks off of a Socket, and return them as a single
> byte array. At the moment I store all the chunks in a Vector, calculate the
> total length, reserve that amount of bytes in a new buffer and fill the
> buffer.
>
> It works, but it's only a temporary solution - not clean, and especially not
> fast (+100ms for a few 100k bytes).
>


Well... first of all for speed, be sure to wrap your input stream in a
BufferedInputStream.

Then look at ByteArrayOutputStream.

But just setting the size on the BufferedInputStream and tuning your
sleep a little might give you the best performance.

Daniel Sjöblom

2004-03-27, 12:30 am

Princess Morgiah wrote:

> Hi everyone,
>
> What is the fastest way of combining x byte arrays of variable length into a
> single new byte array?
>
> I'm trying to download chunks off of a Socket, and return them as a single
> byte array. At the moment I store all the chunks in a Vector, calculate the
> total length, reserve that amount of bytes in a new buffer and fill the
> buffer.
>
> It works, but it's only a temporary solution - not clean, and especially not
> fast (+100ms for a few 100k bytes).
>
> Can somebody recommend me a faster way of doing this?


That is probably as good as it gets algorithmically, but here are some
things that might improve performance:

1. Pool big arrays and read into them directly from the sockets. It is
expensive to allocate arrays in java, so there is no point in allocating
small buffers that you'll throw away later. In general, you'll probably
need a good cache mechanism to achieve any kind of performance.

2. Look into NIO

3. Don't merge the arrays. If you don't have to, don't do it.
--
Daniel Sjöblom
Remove _NOSPAM to reply by mail

Princess Morgiah

2004-03-27, 12:30 am

"Princess Morgiah" <princess_morgiah_nospamplease@yahoo.com> wrote in
message news:7VZ7c.46724$VF2.3347504@phobos.telenet-ops.be...
> Hi everyone,
>
> What is the fastest way of combining x byte arrays of variable length into

a
> single new byte array?
>
> I'm trying to download chunks off of a Socket, and return them as a single
> byte array. At the moment I store all the chunks in a Vector, calculate

the
> total length, reserve that amount of bytes in a new buffer and fill the
> buffer.
>
> It works, but it's only a temporary solution - not clean, and especially

not
> fast (+100ms for a few 100k bytes).


Thanks to everyone who replied. It seems that Daniel's advice was probably
the best - don't merge the arrays :)

Anyway, I'm having a bit of trouble with my code. I created a class to hold
a byte array and it's length (the amount of data that is actually filled),
both results from a read operation on a stream. Another class stores and
combines these blocks. It works just fine for text (say Html and so on), but
fails miserably when I try to get some binary data through.

Here is my code:
--- DataBlock ---
public class DataBlock
{ // start public class DataBlock

private byte [] data;
private int length;

public DataBlock(byte [] data, int length)
{ // start public DataBlock(byte [] data, int length)
this.data = data;
this.length = length;
} // end public DataBlock(byte [] data, int length)

public byte [] getData()
{ // start public byte [] getData()
return data;
} // end public byte [] getData()


public int getLength()
{ // start public int getLength()
return length;
} // end public int getLength()

} // end public class DataBlock

--- DataBlockHandler ---
import java.util.Vector;

public class DataBlockHandler
{ // start public class DataBlockHandler

private Vector vct;

public DataBlockHandler()
{ // start public DataBlockHandler()
initialize();
} // end public DataBlockHandler()

public void addDataBlock(DataBlock db)
{ // start public void addDataBlock(DataBlock db)
if (db != null)
{
vct.add(db);
}
} // end public void addDataBlock(DataBlock db)

public byte [] combineBlocks()
{ // start public byte [] combineBlocks()
byte [] data = null;
int length = 0;

if (vct.size() != 0)
{
for (int i = 0; i < vct.size(); i++)
{
DataBlock db = (DataBlock) vct.elementAt(i);
length += db.getLength();
}

data = new byte[length];
int offset = 0;
for (int i = 0; i < vct.size(); i++)
{
DataBlock db = (DataBlock) vct.elementAt(i);
byte [] temp = db.getData();
for (int x = 0; x < db.getLength(); x++)
{
data[offset++] = temp[x];
}
}
}

return data;
} // end public byte [] combineBlocks()

private void initialize()
{ // start private void initialize()
vct = new Vector();
} // end private void initialize()

} // end public class DataBlockHandler
--- end code ---

Any idea why this fails to combine binary data, but leaves html intact?

Thanks in advance,

Princess Morgiah



Jon A. Cruz

2004-03-27, 12:30 am

Princess Morgiah wrote:
> Anyway, I'm having a bit of trouble with my code. I created a class to hold
> a byte array and it's length (the amount of data that is actually filled),
> both results from a read operation on a stream. Another class stores and
> combines these blocks.


It really sounds like you've got wasted effort there, and
ByteArrayOutputStream could be used to do that for you.


> It works just fine for text (say Html and so on), but
> fails miserably when I try to get some binary data through.


Usually this is because text is not just a sequence of bytes, but of
characters. Remember, in Java characters are 16-bit unsigned Unicode values.



> --- DataBlockHandler ---
> import java.util.Vector;
>
> public class DataBlockHandler
> { // start public class DataBlockHandler
>

ByteArrayOutputStream collector;

>
> public DataBlockHandler()
> { // start public DataBlockHandler()
> initialize();
> } // end public DataBlockHandler()
>
> public void addDataBlock(DataBlock db)
> { // start public void addDataBlock(DataBlock db)
> if (db != null)
> {

collector.write( db.getData(), 0, db.getLength() );
> }
> } // end public void addDataBlock(DataBlock db)
>
> public byte [] combineBlocks()
> { // start public byte [] combineBlocks()

byte [] data = collector.toByteArray();
// optionally call collector.reset() here if that's what you need
> return data;
> } // end public byte [] combineBlocks()
>
> private void initialize()
> { // start private void initialize()

collector = new ByteArrayOutputStream();
> } // end private void initialize()
>
> } // end public class DataBlockHandler
> --- end code ---




>
> Any idea why this fails to combine binary data, but leaves html intact?


The most likely thing is what happens to the data once it leaves
combineBlocks();

You need to actually check. But try switching to the
ByteArrayOutputStream, as it will reduce your code length and reduce the
chance of bugs. You could also probably eliminate the DataBlock class
altogether.

Daniel Sjöblom

2004-03-27, 12:30 am

Princess Morgiah wrote:
> "Princess Morgiah" <princess_morgiah_nospamplease@yahoo.com> wrote in
> message news:7VZ7c.46724$VF2.3347504@phobos.telenet-ops.be...
>
>
> a
>
>
> the
>
>
> not
>
>
>
> Thanks to everyone who replied. It seems that Daniel's advice was probably
> the best - don't merge the arrays :)
>
> Anyway, I'm having a bit of trouble with my code. I created a class to hold
> a byte array and it's length (the amount of data that is actually filled),
> both results from a read operation on a stream. Another class stores and
> combines these blocks. It works just fine for text (say Html and so on), but
> fails miserably when I try to get some binary data through.


The error probably lies somewhere else. What do you use to read from the
Sockets? How does it fail? I'll add some other comments below:

> --- DataBlockHandler ---
> import java.util.Vector;
>
> public class DataBlockHandler
> { // start public class DataBlockHandler
>
> private Vector vct;
>
> public DataBlockHandler()
> { // start public DataBlockHandler()
> initialize();
> } // end public DataBlockHandler()
>
> public void addDataBlock(DataBlock db)
> { // start public void addDataBlock(DataBlock db)
> if (db != null)
> {
> vct.add(db);
> }
> } // end public void addDataBlock(DataBlock db)


This is a minor issue, but you should consider throwing a
NullPointerException or assert db != null, instead of ignoring input.
This can hide bugs in other parts of your program.

> public byte [] combineBlocks()
> { // start public byte [] combineBlocks()
> byte [] data = null;
> int length = 0;


Bad. Very bad. Don't return null when you can return an empty array of
length 0.

> if (vct.size() != 0)
> {
> for (int i = 0; i < vct.size(); i++)
> {
> DataBlock db = (DataBlock) vct.elementAt(i);
> length += db.getLength();
> }
>
> data = new byte[length];
> int offset = 0;
> for (int i = 0; i < vct.size(); i++)
> {
> DataBlock db = (DataBlock) vct.elementAt(i);
> byte [] temp = db.getData();
> for (int x = 0; x < db.getLength(); x++)
> {
> data[offset++] = temp[x];
> }


You do know about System.arraycopy? I suspect it will be much faster
than a manual copy loop.

> }
> }
>
> return data;


(returns null if handler empty)

> } // end public byte [] combineBlocks()


If you are interested in seeing if array pooling can improve
performance, try this class: (note, I don't make any guarantees of
anything, especially not multithreaded performance.)

import java.lang.ref.SoftReference;
import java.util.LinkedList;
import java.util.List;
import java.util.ListIterator;

/**
* <p>A cacheing service for byte arrays. The cache is implemented with
SoftReferences, so that
* the cache will be garbage collected if the JVM is running low on
memory.</p>
*
* <p>To use the class you must first request access to the singleton
instance of the class.</p>
*
* <p>You can use the getByteArray anywhere you use new byte[]. It will
always return a usable array. When
* you no longer need an array, use the returnToCache method.</p>
*
*
* @author Daniel Sjöblom<br>
* Created on Mar 23, 2004<br>
* Copyright (c) 2004<br>
* @version 1.0<br>
*/
public class ByteArrayCache
{
// linked list is better than ArrayList here since there is no random
access.
private List list = new LinkedList();
private static ByteArrayCache instance = null;
/**
* Default private constructor
*
*/
private ByteArrayCache()
{

}

/**
* Adds specified array to this cache. The array can now freely be
accessed by getByteArray().
* Should only be called on arrays that are no longer needed.
* @param array byte[] to cache
* @throws NullPointerException if array == null
*/
public synchronized void returnToCache(byte[] array)
{
if (array == null)
{
throw new NullPointerException("Array can not be null");
}

SoftReference ref = new SoftReference(array);
ListIterator i = list.listIterator();
int len = array.length;
while (i.hasNext())
{
SoftReference oldref = (SoftReference) i.next();
Object o = oldref.get();
if (o == null)
{
i.remove();
}
else
{
byte[] tmp = (byte[]) o;
if (tmp.length <= len)
{
i.previous();
break;
}
}
}

i.add(ref);
}

/**
* Returns a byte[] that is at least as large as minimum size. If there
is no
* array of the required size in the cache, a new array is returned.
Currently uses
* 'worst fit' algorithm, i.e. the biggest block of at least the
requested size
* will be returned if it is in the cache.
* @param minimumSize
* @return byte array
* @throws IllegalArgumentException if minimumSize lesser than 0
*/
public synchronized byte[] getByteArray(int minimumSize)
{
if (minimumSize < 0)
{
throw new IllegalArgumentException("Negative size: " + minimumSize);
}

ListIterator i = list.listIterator();
while (i.hasNext())
{
SoftReference ref = (SoftReference) i.next();
Object o = ref.get();
if (o != null)
{
byte[] tmp = (byte[]) o;
if (tmp.length >= minimumSize)
{
ref.clear();
i.remove();
return tmp;
}
else
{
break;
}
}
else
{
i.remove();
}
}

return new byte[minimumSize];
}
/**
* Access the shared instance of this class
* @return instance of ByteArrayCache
*/
public static synchronized ByteArrayCache getInstance()
{
if (instance == null)
{
instance = new ByteArrayCache();
}

return instance;
}
}


--
Daniel Sjöblom
Remove _NOSPAM to reply by mail



Princess Morgiah

2004-03-27, 12:30 am

"Jon A. Cruz" <jon@joncruz.org> wrote in message
news:4061BF7C.9040802@joncruz.org...
> Princess Morgiah wrote:
hold[color=darkred]
filled),[color=darkred]
>
> It really sounds like you've got wasted effort there, and
> ByteArrayOutputStream could be used to do that for you.


Not only does it sound like a wasted effort - it was. I was merely trying to
keep following the chosen route...

Anyway, I tried it with a ByteArrayOutputStream and it works like a charm!
Perfect.

>
> Usually this is because text is not just a sequence of bytes, but of
> characters. Remember, in Java characters are 16-bit unsigned Unicode

values.
>
>
> The most likely thing is what happens to the data once it leaves
> combineBlocks();


Indeed - I have proven this by writing out files throughout the process,
containing the bytes being shipped around. Prior to the combineBlocks()
method, all bytes are alive and kicking.

> You need to actually check. But try switching to the
> ByteArrayOutputStream, as it will reduce your code length and reduce the
> chance of bugs. You could also probably eliminate the DataBlock class
> altogether.


You were right about both - thank you Jon!

Princess Morgiah


Princess Morgiah

2004-03-27, 12:30 am

"Daniel Sjöblom" <dsjoblom@mbnet.fi_NOSPAM> wrote in message
news:4061da80$0$27069$7b6a8dc4@news.mbnet.fi...
> Princess Morgiah wrote:
<snip>[color=darkred]
> The error probably lies somewhere else. What do you use to read from the
> Sockets? How does it fail? I'll add some other comments below:


I've checked the bytes throughout the process, by writing the arrays to
temporary files as they are shipped from method to method.

It fails because the end result differs from the input - when I'm shipping
an image around, the resulting file is the same size as the original,
contains some of the original blocks, but is no longer an image.

<snip>
>
> This is a minor issue, but you should consider throwing a
> NullPointerException or assert db != null, instead of ignoring input.
> This can hide bugs in other parts of your program.


Granted.

>
> Bad. Very bad. Don't return null when you can return an empty array of
> length 0.


What's the difference? As long as the calling method handles both situations
(empty/non empty) the outcome is the same. Or am I missing something here?

>
> You do know about System.arraycopy? I suspect it will be much faster
> than a manual copy loop.


Ehm... Yes. This is a beautiful example of writing code on the fly - you
think about what it is supposed to do, write it out in one go and end up
reinventing the wheel.

At least this is test code, not production :)

>
> (returns null if handler empty)
>
>
> If you are interested in seeing if array pooling can improve
> performance, try this class: (note, I don't make any guarantees of
> anything, especially not multithreaded performance.)

<snipped code>

Thank you for your code, I'll definitely check it out.

For this specific problem I already went with the solution Jon gave, and
used a ByteArrayOutputStream.

Thanks again for your time and effort,

Princess Morgiah



Jon A. Cruz

2004-03-27, 12:30 am

Princess Morgiah wrote:
>
> I've checked the bytes throughout the process, by writing the arrays to
> temporary files as they are shipped from method to method.
>
> It fails because the end result differs from the input - when I'm shipping
> an image around, the resulting file is the same size as the original,
> contains some of the original blocks, but is no longer an image.


Looking at what exactly are the differences can help.

Are there missing blocks? Zeroed blocks? Certain bytes that are
converted to certain other bytes?

Does 0x0a turn to 0x0d or vice versa?
Do different bytes get converted to 0x3f?

Daniel Sjöblom

2004-03-27, 12:30 am

Princess Morgiah wrote:
>
>
> I've checked the bytes throughout the process, by writing the arrays to
> temporary files as they are shipped from method to method.
>
> It fails because the end result differs from the input - when I'm shipping
> an image around, the resulting file is the same size as the original,
> contains some of the original blocks, but is no longer an image.


I don't want to sound like I doubt you, but I could swear you are using
some encoding to mess things up. There is nothing wrong with the merge
method. Try using this on any file on your computer, and see if it
works: (It shouldn't print anything but the last line if everything is ok)

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;

public class Test
{

public static void main(String[] args) throws IOException
{
// swap a real file on your system here
File file = new File("d:\\printn.com");
InputStream stream = new FileInputStream(file);

int len = (int) (file.length() / 4);

byte[] wholefile = new byte[(int) file.length()];
byte[] arr = new byte[len];
byte[] arr2 = new byte[len*2];
byte[] arr3 = new byte[(int) (file.length() - (len * 3))];

stream.read(arr);
stream.read(arr2);
stream.read(arr3);
stream.close();

stream = new FileInputStream(file);
stream.read(wholefile);
stream.close();

DataBlockHandler h = new DataBlockHandler();
h.addDataBlock(new DataBlock(arr, arr.length));
h.addDataBlock(new DataBlock(arr2, arr2.length));
h.addDataBlock(new DataBlock(arr3, arr3.length));

byte[] merge = h.combineBlocks();
for (int i=0; i < merge.length; i++)
{
if (merge[i] != wholefile[i])
{
System.out.println("Error: not equal " + merge[i] + " " + wholefile[i]);
}
}
System.out.println("Length " + merge.length + " " + wholefile.length +
" match: " + (merge.length == wholefile.length));
}
}



>
>
> What's the difference? As long as the calling method handles both situations
> (empty/non empty) the outcome is the same. Or am I missing something here?


No, the problem is exactly what you said. As long as... it is so very
easy to forget stuff like that. Plus it makes code ugly. Returning an
empty array is always safe.

> For this specific problem I already went with the solution Jon gave, and
> used a ByteArrayOutputStream.


Good idea. While it is around 3-4 times slower (after System.arraycopy
fix) it is certainly easier to debug.

> Thanks again for your time and effort,


No problem. Your problem is/was quite interesting.
--
Daniel Sjöblom
Remove _NOSPAM to reply by mail


Princess Morgiah

2004-03-27, 12:30 am

"Jon A. Cruz" <jon@joncruz.org> wrote in message
news:406270F3.4020000@joncruz.org...
> Princess Morgiah wrote:
shipping[color=darkred]
>
> Looking at what exactly are the differences can help.
>
> Are there missing blocks? Zeroed blocks? Certain bytes that are
> converted to certain other bytes?
>
> Does 0x0a turn to 0x0d or vice versa?
> Do different bytes get converted to 0x3f?


As far as I can tell by comparing the input and output files, some of the
blocks are there - in the output file, a part of the input file starts at a
certain offset.

I guess something was wrong outside this piece of code - I ran the test
Daniel made up and my code behaved just fine for both text and binary files.

Anyway, I've used the ByteArrayOutputStream in the end, so the code that
generated the error is long gone - the program is now up and running and has
already been extensively tested on a lot of different input files, including
those that failed in the beginning.

Thanks again,

Princess Morgiah


Princess Morgiah

2004-03-27, 12:30 am

"Daniel Sjöblom" <dsjoblom@mbnet.fi_NOSPAM> wrote in message
news:4062ef28$0$20231$7b6a8dc4@news.mbnet.fi...
> Princess Morgiah wrote:

<snipped message>
> I don't want to sound like I doubt you, but I could swear you are using
> some encoding to mess things up. There is nothing wrong with the merge
> method. Try using this on any file on your computer, and see if it
> works: (It shouldn't print anything but the last line if everything is ok)


<snipped code>

.... which, as I mentioned in my post to Jon, works just fine. Tested it on
text files - no problem. Binary files give the same output, a clean run.

It must have been some sort of encoding I've overlooked, or some kind of bug
I didn't spot. Anyway, since I'm using the ByteArrayOutputStream method now,
that particular piece of problem code is deleted anyway.

situations[color=darkred]
here?[color=darkred]
>
> No, the problem is exactly what you said. As long as... it is so very
> easy to forget stuff like that. Plus it makes code ugly. Returning an
> empty array is always safe.


You're right. Better not rely on trusting to know the details months later
on. Thanks for pointing it out - time for a code review!

>
> Good idea. While it is around 3-4 times slower (after System.arraycopy
> fix) it is certainly easier to debug.


It sure is - as soon as I made the switch, I got my code to work just fine.

>
> No problem. Your problem is/was quite interesting.


I'll do my very best to create an evenly interesting one next time ;)

Thank you very much,

Princess Morgiah


Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com