ESP Wiki is looking for moderators and active contributors!

Oracle v. Google (2010, USA)

Revision as of 19:07, 22 December 2010 by 97.95.186.121 (talk) (Prior art suggestions)
(For an overview of Java's patent risks and protections, see Java and patents)

In August 2010, Oracle filed a lawsuit (PDF) against Google for "developing Android".[1] The suit claims wilful infringement of software patents related to the Java programming language distributed on Google's Android software. The lawsuit was filed at the US District Court for the Northern District of California.

Oracle acquired Java when they purchased Sun.

The patents

Oracle claims that Google infringes these seven patents granted by the USPTO:

  • 6,125,447 - Protection Domains To Provide Security In A Computer System
  • 6,192,476 - Controlling Access To A Resource
  • 5,966,702 - Method And Apparatus For Preprocessing And Packaging Class Files
  • 7,426,720 - System And Method For Dynamic Preloading Of Classes Through Memory Space Cloning Of A Master Runtime System Process
  • RE38,104 - Method And Apparatus For Resolving Data References In Generate Code
  • 6,910,205 - Interpreting Functions Utilizing A Hybrid Of Virtual And Native Machine Instructions
  • 6,061,520 - Method And System for Performing Static Initialization

Does Java being distributed under GPL v2 help?

Publishing something under the GPL version 2 confers certain protections because sections 6 and 7 contain an implied patent licence. (Distributing under v3 would be better since it has an explicit patent grant.)

No, because Google reimplemented the JRE rather than using OpenJDK.

That really should not matter! It's still use of the source according to the GPL, even if they only 'learned' about the patented parts
Yes, it does matter, because they're not using the OpenJDK source (which has the implied patent license), but instead the Apache Harmony/etc. source, which (obviously) does NOT have an implied patent license from Sun/Oracle/etc. The patent license granted by OpenJDK only covers OpenJDK derivative works, not reimplementations. Consider the ramifications if this weren't the case: if company A releases software under the GPL with the implied patent grant and company B writes proprietary software that infringes on patents held by A, should A be stopped from suing B for patent infringement (when B isn't using A's GPL'd source)? If this were the case, GPL'ing software would be akin to placing patents into the public domain, which is NOT the case.
Well, it _is_ a derivative work! They looked at it and wrote it again with improvements that are also licensed under the GPL. So for me this _is_ a derivative work that has the same implicit patentreuseallowance directly from friendly oracle!
"Derivative Work" means something: it means that it is "based on" copying from the original source code/object code. If we follow your logic, then Wine is a derivative work of Windows (it implements the same APIs!), Samba is a derivative work (it clearly implements the CIFS protocol!), Linux is a derivative work of Unix/POSIX (same APIs!), etc., etc. Clearly none of these are actually considered to be derivative works.
What did they look at? Harmony is a "clean room" implementation. It doesn't have source from OpenJDK so it is NOT a derivative work.
Any derivative work of the OpenJDK needs to be distributed under GPLv2. Harmony however is published under the Apache License, so it cannot be a derivative work.

Would switching to IcedTea or OpenJDK make Android safe from Oracle?

(Note: This section is about the legal situation, not about the motivations of Google.)

Section 7 of GPLv2 means that Oracle agrees not to use patents to prevent royalty-free distribution of the Java software which they distribute under GPLv2. OpenJDK is distributed under GPLv2 plus the "Classpath exception".[2]

By using IcedTea or OpenJDK, Google would have a patent grant from Oracle because Oracle distributed OpenJDK under GPLv2. Thus, the patent grant of the Java Language Specification would not be used, so the limits to that promise (no subsetting, no incompatible changes) would not apply.

Does the grant in the Java Language Specification help?

As suggested by Bruce Perens,[3] Oracle's position may be weakened by this grant in the Java Language Specification:[4]

However, because there is an emphasis in all these clauses about subsetting or supersetting, which Android does, this may not apply.

Further, this grant does not include the right to sublicense. That is incompatible with any free software licence and means that only meeting all of the six onerous requirements would actually grant a patent license.

Sun Microsystems, Inc. (SUN) hereby grants to you a fully paid, nonexclusive, nontransferable, perpetual, worldwide limited license (without the right to sublicense) under SUN's intellectual property rights that are essential to practice this specification. This license allows and is limited to the creation and distribution of clean room implementations of this specification that:
(i) include a complete implementation of the current version of this specification without subsetting or supersetting;
(ii) implement all the interfaces and functionality of the required packages of the Java 2 Platform, Standard Edition, as defined by SUN, without subsetting or supersetting;
(iii) do not add any additional packages, classes, or interfaces to the java.* or javax.* packages or their subpackages;
(iv) pass all test suites relating to the most recent published version of the specification of the Java 2 Platform, Standard Edition, that are available from SUN six (6) months prior to any beta release of the clean room implementation or upgrade thereto;
(v) do not derive from SUN source code or binary materials; and
(vi) do not include any SUN source code or binary materials without an appropriate and separate license from SUN.

To acquire this patent grant the software has to pass all test suites (iv). The test suites are only available to JCP members and have a restriction that limits the use of the tested software to desktops.

Comparisons with Mono and C#

(Moved to: Comparison of Java and C-sharp)

OIN patent pool

Although Dalvik isn't part of OIN, the GNU Java implementation (GCJ/libgcj part of GCC and derived from GNU Classpath) are part of the OIN "System Components" list. Both Oracle and Google have joined OIN. Since this covers the same concepts does that mean OIN should take action?

Searching for prior art

What follows is a short association to each patent where I already heard of it (so like 10 minutes / patent .. something the patent office obviously wasn't able to do .. )

Maybe these can be invalidated in a review.

6,125,447 / 1997

1. A method for providing security, the method comprising the steps of:

establishing one or more protection domains, wherein a protection domain
is associated with zero or more permissions;

establishing an association between said one or more protection domains
and one or more classes of one or more objects; and

determining whether an action requested by a particular object is
permitted based on said association between said one or more protection
domains and said one or more classes.

Prior art: This is C++ private / protected.

No, those apply at the level of individual members - not classes. Can you help? Is that correct?


See http://download.oracle.com/javase/1.4.2/docs/api/java/lang/SecurityManager.html for what this is about. The claim that this relates to object-level member security is incorrect. While conceivably a computer-illiterate jury could be convinced that these were the same thing, that analysis does not stand up to minimal engineering scrutiny. What this describes is a mechanism by which library code in the SDK calls a system-level security manager from within one of its methods, which uses contextual information, an ad-hoc object token and the call stack to determine if the SDK method violates the application's security model (i.e. System.exit(int) calls checkExit() determines if the caller is allowed to terminate the application process). The security manager is pluggable at the application level and may be called by any code, but is typically used within the libraries that comprise the Java SDK. Not the same thing as object-member access at all.

It looks like capabilities OS : [1]

Most operating systems use some sort of role-based protection, even in the description above (SecurityManager), it's only emulating OS security rules with a bit more system-agnostic sugar.

It also smells like PAM, proposed by SUN in an RFC in 1995.

-> The difference: Claim 1 (one, only one!) is only what is written above and nothing more. So here C++ private / protected are perfectly valid protection domains. Maybe we have to find something even more accurate to invalidate one of the claims 2 to n, but in claim 1 there is no single word about members

Members may be classes. Such nested classes have been documented in the first editon of the 'C++ Programming Language' (1985).

C++ (pronounced see plus plus) is a statically typed, free-form,
multi-paradigm, compiled, general-purpose programming language. It is
regarded as a "middle-level" language, as it comprises a combination of
both high-level and low-level language features.[2] It was developed by
Bjarne Stroustrup starting in 1979 at Bell Labs as an enhancement to the
C programming language and originally named C with Classes. It was
renamed C++ in 1983.[3]

--> The Java packaging system seems to conform to all the claims:

1) Protection domains would be the specific packages. 2) A hierarchical association exists between packages and the visibility of classes depends on their declared visibility operators. 3) Objects interact based on the visibility of their class and the class's members.

So if any other language used a similar packaging prior to Java it could be considered prior art.

I would guess KeyKOS and its capability system would apply. See e.g. http://www.cis.upenn.edu/~KeyKOS/OSRpaper.html That puts prior art in 1985.

Not prior art, but this method is clearly obvious. It has been a common technique used in security. Just replace classes and objects with profiles/groups and users and you have the basic security scheme found in almost any modern OS.

-> This isn't necessarily a compile-time but a run-time method and might refer to the JVM security model (byte code verifier and java.lang.SecurityManager)

--> Which basically emulates OS security management since the 80's

Has anyone looked at: Butler Lampson, "Dynamic protection structures", Proceedings AFIPS Conf. 35 (1969), pp 27-38. http://research.microsoft.com/en-us/um/people/blampson/06-DynamicProtect/06-DynamicProtectAbstract.html He introduces domains, discusses capabilities and shows a variety of protection schemes.

The OMG CORBA Security Service from 1994/5 has this concept (see http://www.omg.org/technology/documents/formal/omg_security.htm, there are earlier versions, drafts and individual submission papers preceding these). The term "protection domain" was well known to the submitters from earlier developments (possibly a European standards initiative) and appears as a core part of the specification as "security domain". However, it's possible that some of the source documents retained "protection domain".

Some overlap between Java and CORBA security would be expected as the latter docs were well known to Java designers, e.g. the influence of the CORBA Transaction Service on the Java JTA and JTS designs was extensive.

6,192,476 / 1997

1. A method for providing security, the method comprising the steps of:

detecting when a request for an action is made by a principal; and

in response to detecting the request, determining whether said action is
authorized based on permissions associated with a plurality of routines
in a calling hierarchy associated with said principal, wherein said
permissions are associated with said plurality of routines based on a
first association between protection domains and permissions.

Sounds like Access Control Lists (ACLs), for example documented within POSIX.1e.

This is based on the call stack, not user ID.

-> So what's the difference? I can implement that security on every abstract information unit .. numbers of processes, numbers of classes, numbers on cars, numbers on 32-dimensional-hybrid-airspacecraftplanes, whatever .. for the computers scientist all of this are just numbers ..

Rather than comparing to an ACL, I would say it resembles in part the Supervisor() function in AmigaOS (from 1985). The Supervisor() and SuperState() functions in AmigaOS were used to run code in priviledged mode (i.e. kernel mode) and worked by the exception caused by setting the supervisor bit in the processor checking to see if the cause of the exception was located in one of these functions. If it was not, the attempt to enter supervisor mode would not be allowed.

This sounds exactly like the mechanism specified in CORBA Security (see above), and for similar reasons.

5,966,702 / 1997

1. A method of pre-processing class files comprising:

determining plurality of duplicated elements in a plurality of class files;

forming a shared table comprising said plurality of duplicated elements;

removing said duplicated elements from said plurality of class files to
obtain a plurality of reduced class files; and

forming a multi-class file comprising said plurality of reduced class
files and said shared table

This one is easy: select distinct count(*) from xyz-table;

This is very old SQL-92. (so a public standard published in 1992).

Irrelevant; this is not a database. Conventional linkers might be relevant prior art.

-> This is not irrelevant! For the computer scientist, it's just a distinct count of abstract numbers. So it's really equal for every single computer scientist .. although it might be completely different for lawyers, but the patent-law talks about specialists in the appropriate field and not about lawyers that are too stupid to understand the technical absractions every specialiced scientist learns within the first 5 semesters at university

Sounds like linking of weak symbols? For how long has GNU ld/GCC supported this feature? A draft of the early System V ABI specification mentions weak symbols, but is from 1998: Symbol Table. I guess other Unixes had that feature much earlier?

This patent is about removing duplicate code and creating multi-class files. The multi-class files is just obvious, C++ .obj have been multi-class files from start. Also, removing duplicate symbols, like code or constants has been in optimizing C++ compilers for a very very long time. This is just aplying those well known concepts to Java.

Cfront 3.0 (1992) added support for automatic template instantiation. Obviously, it had to deal with removing multiple instantiations at link-time at one point. This publication (by the inventor of C++ and covering C++ evolution in the 1979-1991 time frame) mentions the use of base classes to express commonality among instantiated templates (p.14).

Obvious avenues to check: shared object libraries and symbol tables in OSes such as IBM S/38 and Multics (?); Symbolics and similar large Lisp/CLOS systems.

7,426,720 / 2003

1. A system for dynamic preloading of classes through memory space
cloning of a master runtime system process, comprising: A processor; A
memory a class preloader to obtain a representation of at least one
class from a source definition provided as object-oriented program code;
a master runtime system process to interpret and to instantiate the
representation as a class definition in a memory space of the master
runtime system process; a runtime environment to clone the memory space
as a child runtime system process responsive to a process request and to
execute the child runtime system process; and a copy-on-write process
cloning mechanism to instantiate the child runtime system process by
copying references to the memory space of the master runtime system
process into a separate memory space for the child runtime system
process, and to defer copying of the memory space of the master runtime
system process until the child runtime system process needs to modify
the referenced memory space of the master runtime system process.

This is simple copy-on-write memory process management, for example used and programmed already in early publicly available Linux kernels, by implementing fork() (I remember that version 2.2 already had it, published 1999, but probably older versions also had it). Unix specification for fork from 1997, although this doesn't mention implementation details. Source-code of old Linux versions should help here.

No, I don't think so. This is refering to the Android Zygot pre-prepared process. There may well be prior art, but simple fork() is not it.

Actually, yes, it is. If an executable is a class definition, and a running process is a class instance, than fork() with copy on write is exactly an implementation of this.

Same deal with Apache and it's worker processes, the configuration file ('source') is only read by the master process, the worker processes are forked in a state where they are immediately ready to handle requests and don't have to each reread the configuration. And the worker's memory state is copy-on-write wrt. the master process.

Reads to me like a description of old-school UNIX fork with copy-on-write memory mapping. Reference: S. Leffler, M. McKusick, M. Karels, J. Quarterman: The Design and Implementation of the 4.3BSD UNIX Operating System, Addison-Wesley, January 1989, ISBN 0-201-06196-1

RE38,104 / 1999

1. In a computer system comprising a program in source code form, a
method for generating executable code for said program and resolving
data references in said generated code, said method comprising the steps
of: a) generating executable code in intermediate form for said program
in source code form with data references being made in said generated
code on a symbolic basis, said generated code comprising a plurality of
instructions of said computer system; b) interpreting said instructions,
one at a time, in accordance to a program execution control; c)
resolving said symbolic references to corresponding numeric references,
replacing said symbolic references with their corresponding numeric
references, and continuing interpretation without advancing program
execution, as said symbolic references are encountered while said
instructions are being interpreted; and d) obtaining data in accordance
to said numeric references, and continuing interpretation after
advancing program execution, as said numeric references are encountered
while said instruction are being interpreted; said steps b) through d)
being performed iteratively and interleaving..]. .[.

This is simply dynamic shared library loading and usage, also well known an understood in the science of informatics since the early 80s. For evidence you could also look at Linux Kernel version 2.0 published 1996.

Looks like a p-code interpreter http://en.wikipedia.org/wiki/P-code_machine to me

Note: The author of this patent James Gosling, who is not in favor of his patent being used in this manner anyway, might be worth asking for help in finding prior art.

6,910,205 / 2002

1. In a computer system, a method for increasing the execution speed of
virtual machine instructions at runtime, the method comprising:
receiving a first virtual machine instruction; generating, at runtime, a
new virtual machine instruction that represents or references one or
more native instructions that can be executed instead of said first
virtual machine instruction; and executing said new virtual machine
instruction instead of said first virtual machine instruction.

These are just-in-time-compilers. They do all of this and much more much better. "October 25, 1996 Sun announces first Just-In-Time (JIT) compiler for Java platform"

Possible Prior Art?: "Binary translation" technology and particularly Digital's "FX!32", which ran x86 Win32 code on Alpha and gradually converted it to native Alpha code as it was executed.

This is not refering to JIT technology or binary translation. It's more like dynamic optimisation.

It is refering to JIT. It literally talks about receiving instructions, translating them to native code equivalents on the fly.

Forth anyone?

Forth implementations usually direct-compile from source code to either threaded code (i.e. "byte-code", or to machine code), which won't pose prior art. However, Forth engine optimizations like Superinstructions (peephole optimization) and static stack caching might pose prior art. I know some papers from Anton Ertl which predate the patent. Things that smell like prior art are:

  • Stack caching for interpreters:

    Dynamic stack caching is a pure run-time method, i.e., the interpreter maintains the state of the cache and the compiler need not be aware of it. This means that there is a copy of the whole interpreter for every cache state. The execution of an instruction can change the state of the cache, and the next instruction has to be executed in the copy of the interpreter corresponding to the new state.

    (i.e. we have multiple copies of VM instructions, and at every interpreter dispatch we compute (i.e. "generate") the correct VM instruction for the current stack state, and execute this one instead).
  • vmgen - A Generator of Efficient Virtual Machine Interpreters

    The current approach to combining instructions into superinstructions is a very simple peephole-optimizing approach: every invocation of gen inst (see Section 4.5) checks if the new instruction can be combined with the last instruction into a superinstruction; of course, the last instruction can already be a superinstruction.

    (i.e. during compilation, which in Forth systems very well occurs at "run-time", we sometimes look back at the generated instructions, and replace them with corresponding superinstructions. "Instructions" in vmgen are just code-addresses of the machine code corresponding to an interpreter primitive, so they "reference native instructions").

Any virtual Turing Machine based on Alan Turings 1937 paper on computable numbers - e.g. http://www.youtube.com/watch?v=E3keLeMwfHY a number were produced and executed in the 1950s and 1960s IBM 1401 Autocoder with Macro facilities in about 1959 IBM 1401 Autocoder running on a System 360 running VM/CMS in the early 1970s. Pascal for the Apple IIe which compiled to p-Code in 1978 A number of Language implementations based on an interpreted virtual machine, including Waterloo APL, produced by University of Waterloo in the the early 1980s, and targeted to the IBM PC.

-> As possible prior art, Smalltalk implementation as described in the paper "Efficient implementation of the smalltalk-80 system" by L. Peter Deutsch et al. (1983)

Another possibility is LISP. In the book "Lisp Programming" by I.Danicic (Blackwell Scientific Publications, 1983) it says (page 81) "The Lisp system usually consists of a Lisp interpreter, but the more ambitious systems also incorporate a compiler.....the compiler then in effect replaces the relevant EXPRs by SUBRs and FEXPSs by FSUBRs and the corresponding properties by machine code functions. As far as the user is concerned this should make no difference (apart from the increased speed)". This is specifically talking about the partial compilation of the intermediate language because it goes on to say "A Lisp program, whether partly compiled or not, ....". This happens at runtime (rather than a pre-compilation phase as with a compiled language), it acts on the intermediate code and only some parts of the program are improved in this manner: all characteristics of the patent.

6,061,520 / 1998

1. A method in a data processing system for statically initializing an
array, comprising the steps of:

compiling source code containing the array with static values to
generate a class file with a clinit method containing byte codes to
statically initialize the array to the static values;

receiving the class file into a preloader;

simulating execution of the byte codes of the clinit method against a
memory without executing the byte codes to identify the static
initialization of the array by the preloader;

storing into an output file an instruction requesting the static
initialization of the array; and

interpreting the instruction by a virtual machine to perform the static
initialization of the array.

Ok, this one sounds like crap. It's so specialised, that it probably can't at all be rebuilt by anyone and google could have never used this if sun hadn't open sourced java an released under the GPL itself ..

Or maybe constant-folding in any C compiler when initializing an array?

It looks to me like this is an automatic refactoring at runtime. There's a piece of bytecode which fills an array with constants; this is detected somehow and replaced with a call to some optimised native routine which fills the array. That routine might be memcpy() from a pre-filled array, or it might often be bzero(). This kind of facility is built into standard executable file formats, so the only "new" thing may be that it's being done at runtime, in which case it's just a standard function of the JIT, or else a specific example of self-modifying code. The slightly bizarre thing is that this is clearly not expected to be done at compile time, as any sane engineer would assume.

Prior art is difficult to prove

In the above discussion, I very often see terms like:

  • User-IDs are not Process-IDs
  • Processes are not Classes

-> One should know, that computer-scientists learn at the university to abstract from these. It's basically the same like people having a driver's license. If you learned at driving school to drive to place A you'll probably afterwards be able to drive to place B on your own. That is because you learned to abstract from the direction to drive to. That's the same with computer scientists and those patents.

Patent obviousness should really be measured by standard university specialists:

An obvious non-invention is, when computer scientist can come to the same solution if faced with the equal problem by only applying several abstractions and / or transferring knowledge from another problem-domain to the on they are currently faced with -- this is the exact thing that computer scientists are trained for !!

-> According to this, most of above patents really are obvious!

Related pages on ESP Wiki

External links

Analyses by related projects soon afterwards

Prior art suggestions

(You can also help by contributing higher up on this page: #Searching for prior art)

In 1991, a company called First Pen Systems briefly sold a product that supported on-demand dynamic loading of compiled and interpreted classes from shared libraries and text files. The product ran on top of GO Corporation's PenPoint Operating System. Perhaps this would be useful for prior art claims. For further information, contact David Beberman at dbeberman@gmail.com.

Articles by journalists from 13 Aug 2010

Early October, Google officially responds

Background

References